This example shows a skeleton for how to build a Dagster project that extracts tables from SQL Server, stores the extract as a CSV in GCS, and then uploads the GCS extract to BigQuery.
The actual extract and load logic is omitted. But the purpose of this project is to show how such a pipeline can be represented in Dagster assets.
First, a single pipeline for one table is created. This is demonstrated in the file dagster_mock_one_table.py
. To run this example:
- Create a Python virtual environment and then run:
pip install dagster dagster-webserver
- After copying the contents of
dagster_mock_one_table.py
to a file with the same name locally, run:
dagster dev -f dagster_mock_one_table.py
The result in Dagster's webserver looks like this:

The second example, dagster_mock_many_tables.py
shows how to build off of the first example to create an asset factory that dynamically generates assets for each of the tables. Follow similar steps as listed above, and then:
dagster dev -f dagster_mock_many_tables.py
The result:

The run logs for a run that targets all of these assets:

With Dagster, you get an operational lineage graph to help you track exactly what data assets (GCS extracts, BQ tables) are being operated on during each run. This example just scratches the surface, Dagster also makes it easy to:
- run incremental data loads using partitions
- run pipelines in response to events (eg new data in SQL Server) instead of just on a schedule
- run individual assets at different cadences or automatically to propagate data changes throughout your platform
- run data quality checks
- alert on failures
- attempt automatic retries