With non-isolated runs, each run executes as a thread inside the long-running code server process. Memory and CPU are shared across all concurrent runs. This guide covers how to profile your runs, size your ECS tasks, configure Dagster's gRPC settings, and decide when to split into multiple replicas.
total_memory = base_code_server_mem + (avg_run_peak_mem × max_concurrent_runs)
total_cpu = base_code_server_cpu + (avg_run_cpu × max_concurrent_runs)
Run one job in isolation and measure its peak resource usage with a dedicated profiling asset:
import tracemalloc
import psutil
import os
from dagster import asset
@asset
def profiled_asset(context):
tracemalloc.start()
proc = psutil.Process(os.getpid())
mem_before = proc.memory_info().rss / 1024**2 # MB
# ... your actual work ...
mem_after = proc.memory_info().rss / 1024**2
peak_mem = tracemalloc.get_traced_memory()[1] / 1024**2
tracemalloc.stop()
context.log.info(f"RSS delta: {mem_after - mem_before:.1f} MB")
context.log.info(f"Peak traced: {peak_mem:.1f} MB")You can also use CloudWatch Container Insights if enabled. Keep this asset in a dev or staging deployment so you can materialize it before making sizing decisions.
avg_run_duration_seconds = 45 # measure from run history
runs_per_hour = 200 # expected throughput
avg_concurrent = (avg_run_duration_seconds / 3600) * runs_per_hour
peak_concurrent = avg_concurrent * 2.5 # ~2.5x burst headroom
print(f"Target concurrency: {peak_concurrent:.0f}")| Concurrent Runs | Suggested Task Size | Notes |
|---|---|---|
| 5–10 light runs | 2 vCPU / 4–8 GB | Good starting point |
| 10–25 medium runs | 4 vCPU / 16 GB | Common production size |
| 25–50 runs | 8 vCPU / 32 GB | Consider replicas instead |
| 50+ | Multiple replicas | Vertical scaling hits diminishing returns |
For Dagster+ ECS, set server_resources (the long-running code server task) separately from run_resources (only used for isolated runs):
locations:
- location_name: my-code-location
image: my-image:latest
code_source:
package_name: my_package
container_context:
ecs:
server_resources: # this is the non-isolated run host
cpu: 2048 # 2 vCPU
memory: 8192 # 8 GB
run_resources: # only relevant for isolated runs
cpu: 1024
memory: 4096Cap concurrency in Dagster to match your task capacity. Without this, ECS will OOM-kill the task under burst load in the Dagster+ UI under Deployment → Run Coordinator.
DAGSTER_GRPC_MAX_WORKERS controls the thread pool size on the code server. Set this as an environment variable on the code server ECS task — either directly in the task definition or via dagster_cloud.yaml:
container_context:
ecs:
env_vars:
- DAGSTER_GRPC_MAX_WORKERS=20
server_resources:
cpu: 2048
memory: 8192The gRPC server uses a ThreadPoolExecutor. A run cannot start until a worker thread picks it up, making this the primary gate on concurrent run count. The default is Python's ThreadPoolExecutor default: num_vCPUs × 5.
The GIL is a performance ceiling, not a hard limit. It serializes Python bytecode execution across threads:
- I/O-bound runs (DB queries, API calls, dbt Cloud): threads release the GIL while waiting, so concurrency scales well. 20–40+ concurrent runs on a single server is realistic.
- CPU-bound runs (pandas, heavy Python computation): threads hold the GIL while computing. Throughput plateaus at roughly 1 vCPU equivalent regardless of thread count. Adding more threads adds overhead with no throughput gain past your vCPU count.
The GIL degrades throughput gradually — it doesn't crash the server. Memory is the hard wall.
Set DAGSTER_GRPC_MAX_WORKERS ≥ max_concurrent_runs so the gRPC thread pool is never the bottleneck. Then use throughput metrics (see below) to validate.
| Constraint | Type | Effect when hit |
|---|---|---|
DAGSTER_GRPC_MAX_WORKERS |
Hard | Runs queue, don't start |
| ECS task memory | Hard | OOM kill — task restarts |
| Thread stack memory | Soft (~8 MB/thread) | Gradual memory pressure |
| GIL contention (CPU-bound) | Performance ceiling | Throughput plateaus, not crashes |
In your CloudFormation template, set:
AgentMetricsEnabled: true
CodeServerMetricsEnabled: true
This emits structured logs to CloudWatch:
{
"request_utilization": {
"max_concurrent_requests": 20,
"num_running_requests": 12,
"num_queued_requests": 3
}
}num_queued_requests > 0 consistently means your DAGSTER_GRPC_MAX_WORKERS is too low or your task is undersized.
- You want fault isolation — one OOM kills only that replica's in-flight runs, not everything
- Runs are mostly CPU-bound — each replica is a separate process with its own GIL, so you get real parallelism
- You want clean rolling deploys (one replica drains at a time)
- Run duration is long (>5 min) — blast radius of a single-server OOM is too costly
- Runs share expensive in-memory state (loaded ML models, large lookup tables) that would be costly to reload per process
- Runs are short and idempotent — OOM blast radius is low and retries are cheap
| Run type | Single server limit | Then do |
|---|---|---|
| Mostly I/O-bound | ~20–30 concurrent | Add a second replica |
| Mixed I/O + CPU | ~10–15 concurrent | 2–3 replicas |
| Mostly CPU-bound | ~4–8 concurrent | 3–4 smaller replicas |
| Long-running (>5 min) | ~10 | Replicas for fault isolation |
avg_concurrent_runs = 15
# Single server OOM: lose all 15 in-flight runs, task restarts
# 3 replicas, one OOM: lose ~5, other 2 replicas keep running immediatelyWith multiple replicas, the ECS agent distributes gRPC requests via round-robin DNS. Set your global max_concurrent_runs in the Dagster+ deployment settings to account for all replicas
Start with 2 replicas of a moderately sized task (4 vCPU / 16 GB), each handling ~15 concurrent runs. This gives you fault isolation, clean rolling deploys, and room to scale to 3–4 replicas before needing to rethink task sizing. Only consolidate to a single large server if shared in-memory state makes per-replica memory cost prohibitive.
| Setting | Where | Purpose |
|---|---|---|
server_resources.cpu/memory |
dagster_cloud.yaml / container_context.yaml |
Code server ECS task size |
DAGSTER_GRPC_MAX_WORKERS |
Code server container env var | gRPC thread pool size |
max_concurrent_runs |
Dagster+ UI | Global run concurrency cap |
AgentMetricsEnabled |
CloudFormation param | Enable agent throughput logs |
CodeServerMetricsEnabled |
CloudFormation param | Enable code server throughput logs |