Skip to content

Instantly share code, notes, and snippets.

@zduymz
Created December 9, 2020 18:53
Show Gist options
  • Save zduymz/c9539a604c522eccdee8b8a890cd8397 to your computer and use it in GitHub Desktop.
Save zduymz/c9539a604c522eccdee8b8a890cd8397 to your computer and use it in GitHub Desktop.
Step 0. Load available DAG definitions from disk (fill DagBag)
While the scheduler is running:
Step 1. The scheduler uses the DAG definitions to
identify and/or initialize any DagRuns in the
metadata db.
Step 2. The scheduler checks the states of the
TaskInstances associated with active DagRuns,
resolves any dependencies amongst TaskInstances,
identifies TaskInstances that need to be executed,
and adds them to a worker queue, updating the status
of newly-queued TaskInstances to "queued" in the
datbase.
Step 3. Each available worker pulls a TaskInstance from
the queue and starts executing it, updating the
database record for the TaskInstance from "queued"
to "running".
Step 4. Once a TaskInstance is finished running, the
associated worker reports back to the queue
and updates the status for the TaskInstance
in the database (e.g. "finished", "failed",
etc.)
Step 5. The scheduler updates the states of all active
DagRuns ("running", "failed", "finished") according
to the states of all completed associated
TaskInstances.
Step 6. Repeat Steps 1-5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment