The typical way of logging individual model run meta data (status, updated timestamp, etc) are via:
- Using post-hooks that basically execute
insert into some_logging_table values ('my_model', 'my_model_status')
queries after each model is built. - The logging package + post-hooks.
- (Snowflake only) The dbt_artifacts package + run-operations. This essentially uploads dbt artifacts that contain model run meta data into a Snowflake stage and materializes it into various views for querying.
Update: The
dbt_artifacts
package can now work with the results context variable allowing us to upload run results during the on-run-end hook: https://github.com/brooklyn-data/dbt_artifacts/tree/1.0.0b1
The problem with the above approaches are that post-hooks don't run if a model fails / errors and if you're using dbt Cloud, run-operation
s are skipped if any of the previous dbt Cloud job steps (such as dbt run
) has errored:
The solution is to make use of on-run-end
hooks that will still execute even if the builing of models themselves errors.
The following steps show how this can be accomplished (in both dbt Core / CLI and dbt Cloud).
- Set up a barebones project to test this out:
-- models/my_model_1.sql
select 1 as val
-- models/my_model_2.sql
-- this model should result in an error as column 'x' doesn't exist.
select x from {{ 'my_model_1' }}
-- models/my_model_3.sql
select * from {{ 'my_model_1' }}
# dbt_project.yml
name: "snowflake"
version: "1.0.0"
config-version: 2
profile: "snowflake"
model-paths: ["models"]
macro-paths: ["macros"]
models:
snowflake:
+materialized: table
on-run-start:
- "{{ create_run_end_results_table() }}"
on-run-end:
- "{{ log_run_end_results(results) }}"
- Add the following macros to the project:
-- macros/log_run_end_results.sql
{% macro create_run_end_results_table() %}
{{ log('Creating `dbt_meta__run_end_results` table if not exists.', info=True) }}
create table if not exists {{ target.database }}.{{ target.schema }}.dbt_meta__run_end_results (
run_invocation_id text not null,
model_identifier text not null,
model_status text not null,
model_message text not null,
updated_at timestamp not null
);
{% endmacro %}
{% macro log_run_end_results(results) %}
{% if execute %}
{{ log('Recording model run results in `dbt_meta__run_end_results`.', info=True) }}
{% for res in results %}
{#
/*
Because results messages can contain single quotes in the error message, we
replace them with double quotes to avoid errors during the insert.
*/
#}
{% set tidy_message = res.message.replace("'", '"') %}
{% set query -%}
insert into {{ target.database }}.{{ target.schema }}.dbt_meta__run_end_results values (
'{{ invocation_id }}',
'{{ res.node.unique_id }}',
'{{ res.status }}',
'{{ tidy_message }}',
current_timestamp()
);
{%- endset %}
{% do run_query(query) %}
{% endfor %}
{% endif %}
{% endmacro %}
Note that on-run-end
hooks have special contextual information available to it (such as the results
variable). I am also making use of the invocation_id
variable to identify distinct dbt runs.
- Build your models via
dbt run
:
- Query the table to double check the status of the models: