Created
December 9, 2020 18:53
-
-
Save zduymz/c9539a604c522eccdee8b8a890cd8397 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Step 0. Load available DAG definitions from disk (fill DagBag) | |
While the scheduler is running: | |
Step 1. The scheduler uses the DAG definitions to | |
identify and/or initialize any DagRuns in the | |
metadata db. | |
Step 2. The scheduler checks the states of the | |
TaskInstances associated with active DagRuns, | |
resolves any dependencies amongst TaskInstances, | |
identifies TaskInstances that need to be executed, | |
and adds them to a worker queue, updating the status | |
of newly-queued TaskInstances to "queued" in the | |
datbase. | |
Step 3. Each available worker pulls a TaskInstance from | |
the queue and starts executing it, updating the | |
database record for the TaskInstance from "queued" | |
to "running". | |
Step 4. Once a TaskInstance is finished running, the | |
associated worker reports back to the queue | |
and updates the status for the TaskInstance | |
in the database (e.g. "finished", "failed", | |
etc.) | |
Step 5. The scheduler updates the states of all active | |
DagRuns ("running", "failed", "finished") according | |
to the states of all completed associated | |
TaskInstances. | |
Step 6. Repeat Steps 1-5 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment