Skip to content

Instantly share code, notes, and snippets.

@slopp
Created February 16, 2023 17:01
Show Gist options
  • Save slopp/8dd440c754f133113f2f668fbe92c6e9 to your computer and use it in GitHub Desktop.
Save slopp/8dd440c754f133113f2f668fbe92c6e9 to your computer and use it in GitHub Desktop.
Dagster: Controlling Parallelism within a Run

Dagster: Controlling Parallelism within a Run

Dagster has many ways to control parallelism. In Dagster Cloud deployments, you can control how many concurrent runs can happen at one time through deployment settings.

Within a run, you can also control how many parallel operations happen at once. By default, runs use the multi-process executor, and the number of parallel operations within a run is based on the number of parallel threads available. For example, if you are using Dagster Cloud Hybrid with Kubernetes, the number of parallel operations within a run will be based on the resources available in a pod.

This behavior can be changed by modifiying the executor settings.

Default

You can specify the default concurrency used by the executorm for all jobs by configuring the executor that is used within a project:

from dagster import Definitions, multiprocess_executor

defs = Definitions(
    executor = multiprocess_executor.configured({"max_concurrent": 1}),
    assets = [my_assets],
    ...
)

Per Job: In Code

You can over-ride this configuration for a specific job in code. For example:

analytics_job = define_asset_job(
     name = "refresh_analytics_model_job",
     selection=AssetSelection.keys(["ANALYTICS", "daily_order_summary"]).upstream(),
     tags = {"dagster/max_retries": "1"},
     config = {"execution": {"config": {"multiprocess": {"max_concurrent": 1}}}}
)

Per Job: Ad-hoc Launches

You can also over-ride this configuration for a specific job at run time via the launchpad. When go to launch a run for a job (or materialize a set of assets by shift clicking the materialize button), supply the following in the launch pad:

execution:
  config:
    multiprocess:
      max_concurrent: 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment