Skip to content

Instantly share code, notes, and snippets.

@jeremyyeo
Last active November 7, 2024 22:47
Show Gist options
  • Save jeremyyeo/06d552ee8facc8100416655ebc25d9b9 to your computer and use it in GitHub Desktop.
Save jeremyyeo/06d552ee8facc8100416655ebc25d9b9 to your computer and use it in GitHub Desktop.
A pattern for moving dbt vars from dbt_project.yml to macros #dbt

A pattern for moving dbt vars from dbt_project.yml to macros

dbt doesn't currently support var declarations outside of the dbt_project.yml file - see relevant issue. This means that multiple users that use vars may run into conflicts or issues as they are all updating the dbt_project.yml file at the same time.

This is a temporary workaround while the core team works on enabling the above functionality.

  1. Setup the dbt_project.yml file:
# dbt_project.yml
name: my_dbt_project
profile: snowflake
config-version: 2
version: 1.0

models:
  my_dbt_project:
    +materialized: table

As you can tell from above, it is not necessary to set any vars in dbt_project.yml for this exercise.

  1. Let's create a macro that looks up a dictionary and returns the value:
-- macros/cvar.sql
{% macro cvar(var_name) -%}

    {%- 
        set all_project_vars = {
            "unit": var("unit", "day"),
            "offset": var("offset", "1")
        }
    -%}
    
    {{ return(all_project_vars[var_name]) }}

{%- endmacro %}

Importantly, all_project_vars is a dictionary with each key value assigned to an actual dbt var with a default value - this is so that we can override the value at runtime (we'll come to that a bit later) but still have the value be set to something (just like we would in the dbt_project.yml file).

  1. Add a toy model to test out the above pattern.
-- models/the_right_time.sql
select 
    dateadd(
        '{{ cvar("unit") }}',
        '{{ cvar("offset") }}',
        to_timestamp_ntz('1970-01-01')
    ) as t
  1. Now, let's try out some runs and see the results:
$ dbt run

00:24:16  On model.my_dbt_project.the_right_time: /* {"app": "dbt", "dbt_version": "1.5.1", "profile_name": "snowflake", "target_name": "default", "node_id": "model.my_dbt_project.the_right_time"} */
create or replace transient table development.dbt_jyeo.the_right_time
         as
        (select 
    dateadd(
        'day',
        '1',
        to_timestamp_ntz('1970-01-01')
    ) as modified_t
        );
$ select * from development.dbt_jyeo.the_right_time;
+-------------------------+                                                     
| MODIFIED_T              |
|-------------------------|
| 1970-01-02 00:00:00.000 |
+-------------------------+

As expected, we didn't have any run time var overrides so we added a day to the epoch date. Now, let's try some run time overrides:

$ dbt run --vars 'offset: 10'

00:27:21  On model.my_dbt_project.the_right_time: /* {"app": "dbt", "dbt_version": "1.5.1", "profile_name": "snowflake", "target_name": "default", "node_id": "model.my_dbt_project.the_right_time"} */
create or replace transient table development.dbt_jyeo.the_right_time
         as
        (select 
    dateadd(
        'day',
        '10',
        to_timestamp_ntz('1970-01-01')
    ) as modified_t
        );
$ select * from development.dbt_jyeo.the_right_time;
+-------------------------+                                                     
| MODIFIED_T              |
|-------------------------|
| 1970-01-11 00:00:00.000 |
+-------------------------+

^ 10 days added as expected as we overrode the offset var at run time. And finally, overriding both vars at the same time:

$ dbt run --vars '{unit: year, offset: 50}'

00:29:45  On model.my_dbt_project.the_right_time: /* {"app": "dbt", "dbt_version": "1.5.1", "profile_name": "snowflake", "target_name": "default", "node_id": "model.my_dbt_project.the_right_time"} */
create or replace transient table development.dbt_jyeo.the_right_time
         as
        (select 
    dateadd(
        'year',
        '50',
        to_timestamp_ntz('1970-01-01')
    ) as modified_t
        );
$ select * from development.dbt_jyeo.the_right_time;
+-------------------------+                                                     
| MODIFIED_T              |
|-------------------------|
| 2020-01-01 00:00:00.000 |
+-------------------------+

We've successfully added 50 years.

As users want to add new vars to the project, they simply need to modify the macro by adding it to the dictonary and not actually have to touch dbt_project.yml:

-- macros/cvar.sql
{% macro cvar(var_name) -%}

    {%- 
        set all_project_vars = {
            "unit": var("unit", "day"),
            "offset": var("offset", "1"),
            "some_new_variable": var("some_new_variable", "foo")
        }
    -%}
    
    {{ return(all_project_vars[var_name]) }}

{%- endmacro %}
@jeremyyeo
Copy link
Author

Because it's a macro you can not use it in db_project.yml, correct? Hence why we stick with environment variables :(

Yeah - I think that's accurate - because dbt_project.yml cannot use macros like normal. Above is pretty much only for the most basic use case where you declare vars and then use them in models for filtering or what have you...

# dbt_project.yml
...
vars:
  look_back_period: '1970-01-01'
...
-- some_model.sql
select * from foo where date >= '{{ var("look_back_period") }}'

@timvw
Copy link

timvw commented Jun 15, 2023

Yes.. I tend to do the same

(but do not specify a (default) value for the variable.. That way, dbt greets me with an error when trying to run a model where the variable is not provided).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment