Loom: https://www.loom.com/share/84f2ae5463fa48048b9c578244ceb440
Note: dbt's responsiblity is to generate the same DDL/DML everytime for the same dbt sql/jinja. dbt is not responsible for making sure your data is unique, it is not responsible for the shape of your data, etc - you yourself are responsible for that.
At a high level, what we're trying to do here is to:
- At the start of the run, make backups of the snapshot
snappy
and the data it is snapshotting - the modelraw
. - Test that after the snapshot has completed - if the
dbt_scd_id
is still unique. - If it passes, everything is good - so we can drop the backups we created in (1).
- If it fails, it means some duplicates has gotten into the snapshot somehow - here we leave the backups we created in (1) so that we can come back the next day and replay all the steps so as to figure out what rows caused the duplicate. Here we also create a backup copy of
raw
at the end of the run so that we have the exact copy ofraw
as of the time we used it to insert intosnappy
. (This is because we may have other processes or runs that try to modifyraw
before we had the time to do our debugging).
# dbt_project.yml
...
on-run-start: "{{ make_backups() }}"
on-run-end: "{{ check_results_drop_backups(results) }}"
# snapshots/snappy.yml
version: 2
snapshots:
- name: snappy
columns:
- name: dbt_scd_id
tests:
- unique
-- snapshots/snappy.sql
{% snapshot snappy %}
{{
config(
target_schema='dbt_jyeo_snapshots',
unique_key='id',
strategy='check',
check_cols=['user_name', 'user_status']
)
}}
select * from {{ ref('raw') }}
{% endsnapshot %}
-- models/raw.sql
{{ config(materialized='table') }}
select 1 id, 'alice' as user_name, 'inactive' as user_status