Send API requests to deployed endpoints

After deploying your Flash app with flash deploy, you can call your endpoints directly via HTTP. The request format depends on whether you’re using queue-based or load-balanced configurations.

Authentication

All deployed endpoints require authentication with your Runpod API key:

export RUNPOD_API_KEY="your_key_here"

curl -X POST https://YOUR_ENDPOINT_URL/path \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"param": "value"}'

Your endpoint URLs are displayed after running flash deploy. You can also view them with flash env get <environment-name>.

Queue-based endpoints

Queue-based endpoints (using @Endpoint(name=..., gpu=...) decorator) provide two routes for job submission: /run (asynchronous) and /runsync (synchronous).

Asynchronous calls (`/run`)

Submit a job and receive a job ID for later status checking:

curl -X POST https://api.runpod.ai/v2/abc123xyz/run \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"input": {"prompt": "Hello world"}}'

Response:

{
    "id": "job-abc-123",
    "status": "IN_QUEUE"
}

Check job status and retrieve results:

curl https://api.runpod.ai/v2/abc123xyz/status/job-abc-123 \
    -H "Authorization: Bearer $RUNPOD_API_KEY"

When the job completes:

{
    "id": "job-abc-123",
    "status": "COMPLETED",
    "output": {
        "generated_text": "Hello world from GPU!"
    }
}

Synchronous calls (`/runsync`)

Wait for job completion and receive results directly (with timeout):

curl -X POST https://api.runpod.ai/v2/abc123xyz/runsync \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"input": {"prompt": "Hello world"}}'

Response (after job completes):

{
    "id": "job-abc-123",
    "status": "COMPLETED",
    "output": {
        "generated_text": "Hello world from GPU!"
    }
}

The /runsync endpoint has a 60-second client-side timeout by default. If you’ve configured execution_timeout_ms on your endpoint, the client timeout uses that value instead. For jobs that take longer than 60 seconds, set execution_timeout_ms to prevent /runsync requests from timing out.

Use /run for long-running jobs that you’ll check later. Use /runsync for quick jobs where you want immediate results (with timeout protection).

Queue-based request format

Queue-based endpoints expect input wrapped in an {"input": {...}} object:

curl -X POST https://api.runpod.ai/v2/abc123xyz/runsync \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "input": {
            "param1": "value1",
            "param2": "value2"
        }
    }'

The structure inside "input" depends on your @Endpoint function signature.

Job status states

Status	Description
`IN_QUEUE`	Waiting for an available worker
`IN_PROGRESS`	Worker is executing your function
`COMPLETED`	Function finished successfully
`FAILED`	Execution encountered an error

Load-balanced endpoints

Load-balanced endpoints (using api = Endpoint(...); @api.post("/path") pattern) provide custom HTTP routes with direct request/response patterns.

Calling load-balanced routes

All routes share the same base URL. Append the route path to call specific functions:

# POST route
curl -X POST https://abc123xyz.api.runpod.ai/analyze \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"text": "Hello world from Flash"}'

# GET route
curl -X GET https://abc123xyz.api.runpod.ai/info \
    -H "Authorization: Bearer $RUNPOD_API_KEY"

# Another POST route (same endpoint URL)
curl -X POST https://abc123xyz.api.runpod.ai/validate \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"name": "Alice", "email": "alice@example.com"}'

Load-balanced request format

Load-balanced endpoints accept direct JSON payloads (no {"input": {...}} wrapper):

curl -X POST https://abc123xyz.api.runpod.ai/process \
    -H "Authorization: Bearer $RUNPOD_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "param1": "value1",
        "param2": "value2"
    }'

The payload structure depends on your function signature. Each route can accept different parameters.

Multiple routes, single endpoint

A single load-balanced endpoint can serve multiple routes:

from runpod_flash import Endpoint

api = Endpoint(name="api-server", cpu="cpu5c-4-8", workers=(1, 5))

# All these routes share one endpoint URL
@api.post("/generate")
async def generate_text(prompt: str): ...

@api.post("/translate")
async def translate_text(text: str): ...

@api.get("/health")
async def health_check(): ...

# All use the same base URL with different paths
curl -X POST https://abc123xyz.api.runpod.ai/generate -H "..." -d '{...}'
curl -X POST https://abc123xyz.api.runpod.ai/translate -H "..." -d '{...}'
curl -X GET https://abc123xyz.api.runpod.ai/health -H "..."

Quick reference

Endpoint Type	Routes	Request Format	Response
Queue-based	`/run`, `/runsync`, `/status/{id}`	`{"input": {...}}`	Job ID (async) or result (sync)
Load-balanced	Custom paths (e.g., `/process`)	Direct JSON payload	Direct response

Response status codes

Code	Meaning
`200`	Success (load-balanced) or job accepted (queue-based)
`400`	Bad request (invalid input format)
`401`	Unauthorized (invalid or missing API key)
`404`	Route not found
`500`	Internal server error

Error handling

Queue-based errors appear in the job output:

{
    "id": "job-abc-123",
    "status": "FAILED",
    "error": "Error message from your function"
}

Load-balanced errors return HTTP error codes with JSON body:

{
    "error": "Error message from your function",
    "detail": "Additional error context"
}

Using SDKs

For programmatic access, use the Runpod Python SDK:

import runpod

# Set API key
runpod.api_key = "your_api_key"

# Connect to endpoint
endpoint = runpod.Endpoint("YOUR_ENDPOINT_ID")

# Async call (returns job object immediately)
run_request = endpoint.run({"prompt": "Hello world"})
status = run_request.status()  # Check status
output = run_request.output()  # Get result once complete

# Sync call (blocks until complete)
result = endpoint.run_sync({"prompt": "Hello world"})

See the Runpod SDK documentation for complete SDK usage.

Next steps

Deploy apps

Deploy your Flash app to get endpoint URLs.

Configuration reference

View all endpoint configuration parameters.

Runpod SDK

Use the Python SDK for programmatic access.

Get started

Flash

Serverless

Pods

Storage

Public Endpoints

Instant Clusters

Integrations

Hub

Fine-tuning

Reference

Send API requests to deployed endpoints

Authentication

Queue-based endpoints

Asynchronous calls (`/run`)

Synchronous calls (`/runsync`)

Queue-based request format

Job status states

Load-balanced endpoints

Calling load-balanced routes

Load-balanced request format

Multiple routes, single endpoint

Quick reference

Response status codes

Error handling

Using SDKs

Next steps

Deploy apps

Configuration reference

Runpod SDK

Get started

Flash

Serverless

Pods

Storage

Public Endpoints

Instant Clusters

Integrations

Hub

Fine-tuning

Reference

​Authentication

​Queue-based endpoints

​Asynchronous calls (/run)

​Synchronous calls (/runsync)

​Queue-based request format

​Job status states

​Load-balanced endpoints

​Calling load-balanced routes

​Load-balanced request format

​Multiple routes, single endpoint

​Quick reference

​Response status codes

​Error handling

​Using SDKs

​Next steps

Deploy apps

Configuration reference

Runpod SDK

Authentication

Queue-based endpoints

Asynchronous calls (`/run`)

Synchronous calls (`/runsync`)

Queue-based request format

Job status states

Load-balanced endpoints

Calling load-balanced routes

Load-balanced request format

Multiple routes, single endpoint

Quick reference

Response status codes

Error handling

Using SDKs

Next steps