Skip to content

Instantly share code, notes, and snippets.

@eponkratova
Forked from jeremyyeo/README.md
Created November 25, 2022 06:39
Show Gist options
  • Save eponkratova/0b3aee7626f16e229c2ed6ffa03b4183 to your computer and use it in GitHub Desktop.
Save eponkratova/0b3aee7626f16e229c2ed6ffa03b4183 to your computer and use it in GitHub Desktop.
Overriding dbt Cloud default database / schema on CI runs #dbt

Overriding dbt Cloud default database / schema on CI runs

-!  🚨                                          WARNING                                          🚨  !-
You probably do not want to do this because dbt Cloud will not be able to drop the relevant schema 
upon PR merge / close so you will end up with clutter if you are not on top of this.

The following is the default behaviour of dbt Cloud CI runs when:

Determining the database to write to.

The database is determined by the environment that the CI job is running in.

Click to reveal

In the screenshot above, this would mean that our CI runs will build models into the DEVELOPMENT database.

Determining the schema to write to.

The schema is templated and takes the form of DBT_CLOUD_PR_<job_id>_<pr_number>.

Click to reveal

The job_id above is 41386.

Click to reveal

And the pr_number above is 9.

This results in a schema of DBT_CLOUD_PR_41386_9.

Given the above, the default behaviour of our CI run is to write to DEVELOPMENT.DBT_CLOUD_PR_41386_9 and when PR's are merged or close, we can expect dbt Cloud to "cleanup" via running something like the following query

DROP SCHEMA IF EXISTS DEVELOPMENT.DBT_CLOUD_PR_41386_9;

Note that dbt Cloud should still be able to drop the schema as long as the name contains the target schema determined above, meaning it should still be able to drop the following:

DEVELOPMENT.DBT_CLOUD_PR_41386_9_FOO
DEVELOPMENT.DBT_CLOUD_PR_41386_9_FOO_BAR

Overriding the default behaviour

If we want to change the above behaviour (only the writing to parts, not the "cleanup" parts mind you), we can first change the target name of our CI job:

Click to reveal

Here we're changing the target name to be ci so that we can use it in our macro overrides.

To change the database that our CI job is writing to.

Override the default generate_database_name macro:

{# -- put this in macros/generate_database_name.sql #}
{%- macro generate_database_name(custom_database_name=none, node=none) -%}
    {%- set default_database = target.database -%}
    {%- if target.name == 'ci' -%}
        ci_jyeo
    {%- else -%}
        {{ default_database }}
    {%- endif -%}
{%- endmacro -%}

The above macro, means that our CI job will always write models to the CI_JYEO database independent of what the environment's database config is (which is set to DEVELOPMENT above).

To change the schema that our CI job is writing to.

Override the default generate_schema_name macro:

{# -- put this in macros/generate_schema_name.sql #}
{% macro generate_schema_name(custom_schema_name=none, node=none) -%}
    {%- set default_schema = target.schema -%}
    {%- if target.name == 'ci' -%}
        ci_jyeo
    {%- else -%}
        {{ default_schema }}
    {%- endif -%}
{%- endmacro %}

The above macro, means that our CI job will always write models to the CI_JYEO schema instead of DBT_CLOUD_PR_41386_9 as we have determined above.

As mentioned above, making use of these overrides means dbt Cloud will not be able to drop your CI schema upon PR merge / close. An alternative could be scheduling an end of week job to do the clean up for us.

Snowflake workflow

For folks who are on Snowflake and wanting to Zero Copy Clone their production database, we can try to use some macros like:

{%- macro clone_db() -%}
    {% set sql %}
        DROP DATABASE IF EXISTS ci_jyeo;
        CREATE DATABASE ci_jyeo CLONE development;
        GRANT ALL ON DATABASE ci_jyeo TO ROLE transformer;
    {% endset %}
    {% do log("Cloning database 'development' to 'ci_jyeo'", info=True) %}
    {% do run_query(sql) %}
    {% do log("Databse cloning commpleted", info=True) %}
{%- endmacro -%}

{%- macro drop_db() -%}
    {% set sql %}
        DROP DATABASE IF EXISTS ci_jyeo;
    {% endset %}
    {% do log("Dropping database 'ci_jyeo'", info=True) %}
    {% do run_query(sql) %}
    {% do log("Databse dropped", info=True) %}
{%- endmacro -%}

And in our CI job we have the following steps:

dbt run-operation clone_db  # Clone prod to our ci db.
dbt run -s state:modified+  # Build the new models in the CI db in the CI schema, assuming we used both overrides above.
dbt test                    # Run tests.
dbt run-operation drop_db   # Drop the CI db - you can as easily adapt the macro to drop the CI schema instead.

The above steps mean that in the event of test failure, we do not proceed to the next step which drops our CI db / schema.

Help with PR schemas not dropping

If for some reason your PR schemas are not dropping after a PR is merged/closed (and you are not overriding them like the above examples show), you can use the following macros to help drop them:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment