Endpoint class.
Parameter overview
| Parameter | Type | Description | Default |
|---|---|---|---|
name | str | Endpoint name (required unless id= is used) | - |
id | str | Connect to existing endpoint by ID | None |
gpu | GpuGroup, GpuType, or list | GPU type(s) for the endpoint | GpuGroup.ANY |
cpu | str or CpuInstanceType | CPU instance type (mutually exclusive with gpu) | None |
workers | int or (min, max) | Worker scaling configuration | (0, 1) |
idle_timeout | int | Seconds before scaling down idle workers | 60 |
dependencies | list[str] | Python packages to install | None |
system_dependencies | list[str] | System packages to install (apt) | None |
accelerate_downloads | bool | Enable download acceleration | True |
volume | NetworkVolume | Network volume for persistent storage | None |
datacenter | DataCenter | Preferred datacenter | EU_RO_1 |
env | dict[str, str] | Environment variables | None |
gpu_count | int | GPUs per worker | 1 |
execution_timeout_ms | int | Max execution time in milliseconds | 0 (no limit) |
flashboot | bool | Enable Flashboot fast startup | True |
image | str | Custom Docker image to deploy | None |
scaler_type | ServerlessScalerType | Scaling strategy | auto |
scaler_value | int | Scaling threshold | 4 |
template | PodTemplate | Pod template overrides | None |
Parameter details
name
Type:str
Required: Yes (unless id= is specified)
The endpoint name visible in the Runpod console. Use descriptive names to easily identify endpoints.
id
Type:str
Default: None
Connect to an existing deployed endpoint by its ID. When id is specified, name is not required.
gpu
Type:GpuGroup, GpuType, or list[GpuGroup | GpuType]
Default: GpuGroup.ANY (if neither gpu nor cpu is specified)
Specifies GPU hardware for the endpoint. Accepts a single GPU type/group or a list for fallback strategies.
cpu
Type:str or CpuInstanceType
Default: None
Specifies a CPU instance type. Mutually exclusive with gpu.
workers
Type:int or tuple[int, int]
Default: (0, 1)
Controls worker scaling. Accepts either a single integer (max workers with min=0) or a tuple of (min, max).
workers=Norworkers=(0, N): Cost-optimized, allows scale to zeroworkers=(1, N): Avoid cold starts by keeping at least one worker warmworkers=(N, N): Fixed worker count for consistent performance
idle_timeout
Type:int
Default: 60
Number of seconds workers will stay active (running) after completing a request, waiting for additional requests before scaling down (to minimum workers).
30-60 seconds: Cost-optimized, infrequent traffic60-120 seconds: Balanced, variable traffic patterns120-300 seconds: Latency-optimized, consistent traffic
dependencies
Type:list[str]
Default: None
Python packages to install on the remote worker before executing your function. Supports standard pip syntax.
system_dependencies
Type:list[str]
Default: None
System-level packages to install via apt before your function runs.
accelerate_downloads
Type:bool
Default: True
Enables faster downloads for dependencies, models, and large files. Disable if you encounter compatibility issues.
volume
Type:NetworkVolume
Default: None
Attaches a network volume for persistent storage. Volumes are mounted at /runpod-volume/. Flash uses the volume name to find an existing volume or create a new one.
- Share large models across workers
- Persist data between runs
- Share datasets across endpoints
datacenter
Type:DataCenter
Default: DataCenter.EU_RO_1
Preferred datacenter for worker deployment.
Flash Serverless deployments are currently restricted to
EU-RO-1.env
Type:dict[str, str]
Default: None
Environment variables passed to all workers. Useful for API keys, configuration, and feature flags.
os.environ:
Environment variables are excluded from configuration hashing. Changing environment values won’t trigger endpoint recreation, making it easy to rotate API keys.
gpu_count
Type:int
Default: 1
Number of GPUs per worker. Use for multi-GPU workloads.
execution_timeout_ms
Type:int
Default: 0 (no limit)
Maximum execution time for a single job in milliseconds. Jobs exceeding this timeout are terminated.
The Flash SDK’s
runsync() method uses your execution_timeout_ms value as the client-side HTTP timeout. If set to a positive value, the SDK waits that duration for the job to complete. If unset or set to 0, the SDK defaults to a 60-second timeout. For long-running inference jobs, set execution_timeout_ms to prevent premature timeouts.flashboot
Type:bool
Default: True
Enables Flashboot for faster cold starts by pre-loading container images.
False for debugging or compatibility reasons.
image
Type:str
Default: None
Custom Docker image to deploy. When specified, the endpoint runs your Docker image instead of Flash’s managed workers.
scaler_type
Type:ServerlessScalerType
Default: Auto-selected based on endpoint type
Scaling algorithm strategy. Defaults are automatically set:
- Queue-based:
QUEUE_DELAY(scales based on queue depth) - Load-balanced:
REQUEST_COUNT(scales based on active requests)
scaler_value
Type:int
Default: 4
Parameter value for the scaling algorithm. With QUEUE_DELAY, represents target jobs per worker before scaling up.
template
Type:PodTemplate
Default: None
Advanced pod configuration overrides.
PodTemplate
PodTemplate provides advanced pod configuration options:
| Parameter | Type | Description | Default |
|---|---|---|---|
containerDiskInGb | int | Container disk size in GB | 64 |
env | list[dict] | Environment variables as list of {"key": "...", "value": "..."} | None |
EndpointJob
When usingEndpoint(id=...) or Endpoint(image=...), the .run() method returns an EndpointJob object for async operations:
Configuration change behavior
When you change configuration and redeploy, Flash automatically updates your endpoint.Changes that recreate workers
These changes restart all workers:- GPU configuration (
gpu,gpu_count) - CPU instance type (
cpu) - Docker image (
image) - Storage (
volume) - Datacenter (
datacenter) - Flashboot setting (
flashboot)
Changes that update settings only
These changes apply immediately with no downtime:- Worker scaling (
workers) - Timeouts (
idle_timeout,execution_timeout_ms) - Scaler settings (
scaler_type,scaler_value) - Environment variables (
env) - Endpoint name (
name)