Skip to content

Instantly share code, notes, and snippets.

View jeremyyeo's full-sized avatar

Jeremy Yeo jeremyyeo

View GitHub Profile
@jeremyyeo
jeremyyeo / README.md
Last active December 9, 2023 22:44
How to get models to run only on local timezones if your scheduler only supports UTC #dbt

How to get models to run only on local timezones if your orchestration platform only supports UTC

If you're using an orchestration platform that only supports UTC timezones - you may find it difficult cumbersome to schedule jobs to run on local timezones, accounting for daylight savings time switchovers, etc. Let's see how we can write a dbt macro using some builtin python modules to help us out.

-- macros/is_nz_business_hours.sql
{% macro is_nz_business_hours() %}
    {% set flag = 0 %}
    {% set dt = modules.datetime %}
    {% set pz = modules.pytz %}
@jeremyyeo
jeremyyeo / README.md
Last active March 26, 2024 15:34
How to do multi-threaded / parallel operations with dbt #dbt

How to do multi-threaded / parallel operations with dbt

-!  🚨                                          WARNING                                          🚨  !-
Just because you could does not mean you should!

Examples below are using PostgreSQL however the same idea apply across adapters (Snowflake,etc).

In order to run arbitrary SQL commands with dbt - we would typically use hooks or operations. Imagine for a second, we have to run an operation that performs some action multiple times:

@jeremyyeo
jeremyyeo / README.md
Last active November 10, 2023 21:06
How to correctly use a macro that returns a value to a hook #dbt
@jeremyyeo
jeremyyeo / README.md
Last active November 8, 2023 04:08
Using alternative hosts for dbt hub packages #dbt

Using alternative hosts for dbt hub packages

Full documentation for dbt packages are available here: https://docs.getdbt.com/docs/build/packages and this writeup just reuses what's already there.

The most common pattern of using dbt packages is to use one from the dbt Package hub (https://hub.getdbt.com/). For example:

# packages.yml
packages:
  - package: dbt-labs/dbt_utils
@jeremyyeo
jeremyyeo / README.md
Last active November 8, 2023 02:33
Using query comments to identify how things came to be #dbt

Using query comments to identify how things came to be #dbt

This is written for Databricks but the same information apply to any other dbt supported datawarehouse. Note that this also uses dbt Cloud specific env vars but is not necessarily required.

Have you ever asked yourself how things came to be? How did a table get created? How did a schema get created?

Well fret no more, using both:

@jeremyyeo
jeremyyeo / README.md
Last active October 9, 2024 09:30
Freshness checking models instead of sources #dbt

Freshness checking models instead of sources

dbt has a function to check the freshness (i.e. how up to date) of a particular source. That functionality is reserved exclusively for sources and not models - i.e. if you tried to apply source freshness on a model like you would a source:

# models/schemas.yml
version: 2
models:
  - name: foo
 loaded_at_field: updated_at
@jeremyyeo
jeremyyeo / README.md
Last active September 22, 2023 05:07
Converting the timestamps in dbt #dbt

Converting the timestamps in dbt

It's not possible to change the implementation of the internal timestamps recorded by dbt's artifacts (i.e. the timestamps recorded in the on-run-end context results var or the run_results.json file) - so you'll have to manually change it yourself using some builtin modules. Keep in mind that this does not change the original values in run_results.json or in results - those WILL always be in UTC timestamps.

In the example below, we're simply converting one of the timestamps dbt records (when a model is completed) into a different timezone - then it is our responsiblity for then taking that converted value and writing it to a table in the database if we want to.

# dbt_project.yml
name: my_dbt_project
profile: all
@jeremyyeo
jeremyyeo / README.md
Last active September 1, 2023 03:47
Can dbt introspect a table created in a pre-hook if we try to query from it in the body of the model? #dbt

Can dbt introspect a table created in a pre-hook if we try to query from it in the body of the model?

If we are good dbt-citizens and stay within the bounds of dbt's normal behaviour - then this is not possible. See the following example.

Make sure that the table that we are about to create in our hook, does not yet exist in our database.

drop table if exists development.dbt_jyeo.this_doesnt_exist;
@jeremyyeo
jeremyyeo / README.md
Last active August 31, 2023 10:37
What keys to use from results or graph to associate a test with a node (model/source) #dbt

What keys to use from results or graph to associate a test with a node (model/source)

Associating a model with its test

# dbt_project.yml
name: my_dbt_project
CREATE OR REPLACE PROCEDURE dbt_jyeo.fake_sleep(seconds INT64)
BEGIN
DECLARE wait BOOL DEFAULT TRUE;
DECLARE delay_time DATETIME DEFAULT DATETIME_ADD(CURRENT_DATETIME, INTERVAL seconds SECOND);
WHILE wait DO
IF (delay_time < CURRENT_DATETIME) THEN
SET wait = FALSE;
END IF;
END WHILE;
SELECT 'Slept for ' || seconds || ' seconds';