Skip to content

Instantly share code, notes, and snippets.

View jeremyyeo's full-sized avatar

Jeremy Yeo jeremyyeo

View GitHub Profile
@jeremyyeo
jeremyyeo / test_fabric.py
Last active July 12, 2024 06:21
Test connecting to MS Fabric
# pip install pyodbc
import pyodbc
# From Entra.
# https://prodata.ie/2023/11/15/service-principal-fabric/
service_principal_client_id = "<Client ID>"
service_principal_client_secret = "<Client Secret>"
# From Fabric UI - there is a 'SQL connection string' that we will use as the 'server'.
server = "<guid>-<guid>.datawarehouse.fabric.microsoft.com"
@jeremyyeo
jeremyyeo / README.md
Last active August 8, 2024 03:00
Converting a dbt jinja macro to a python function for use in a python model #dbt

Converting a dbt jinja macro to a python function for use in a python model

Update: See this article on sharing Python functions across models https://github.com/jeremyyeo/the-hitchhikers-guide-to-dbt/tree/main/snowflake-python-models-shared-functions

Here's a quick example of converting a dbt jinja macro used in a sql model into a python function that is used in a python model instead. It is currently not possible to use a jinja macro as is in a python model.

# dbt_project.yml
name: my_dbt_project
profile: all
@jeremyyeo
jeremyyeo / README.md
Last active July 11, 2024 22:52
Debugging dbt snapshots #dbt

Debugging dbt snapshots

Loom: https://www.loom.com/share/84f2ae5463fa48048b9c578244ceb440

Note: dbt's responsiblity is to generate the same DDL/DML everytime for the same dbt sql/jinja. dbt is not responsible for making sure your data is unique, it is not responsible for the shape of your data, etc - you yourself are responsible for that.

At a high level, what we're trying to do here is to:

  1. At the start of the run, make backups of the snapshot snappy and the data it is snapshotting - the model raw.
  2. Test that after the snapshot has completed - if the dbt_scd_id is still unique.
import csv
import datetime
import string
from faker import Faker # make sure to `pip install Faker`
NUMS = 10000
fake = Faker()
current_time = datetime.datetime.utcnow()
# header = [
@jeremyyeo
jeremyyeo / README.md
Last active February 2, 2024 06:23
Which dbt nodes respect the generate_schema_name macro? #dbt

Which dbt nodes respect the generate_schema_name macro?

dbt has many types of "nodes"/"resources" - e.g. models, sources, seeds - so which of them actually respect the generate_schema_name() macro? Let's have a look.

The following is tested using:

Core:
  - installed: 1.7.7
  - latest:    1.7.7 - Up to date!
@jeremyyeo
jeremyyeo / README.md
Last active November 7, 2024 10:13
Stopping dbt from auto-expanding column types #dbt

Stopping dbt from auto-expanding column types

By default dbt has functionality that auto-expands similar column types (i.e. varchar(3) to varchar(4)) if the incoming data is too large (https://docs.getdbt.com/faqs/Snapshots/snapshot-schema-changes). We can see this happening like so:

-- models/foo.sql
{{ config(materialized='incremental') }}
select 'foo' as c
@jeremyyeo
jeremyyeo / README.md
Last active December 12, 2023 01:09
A quick explainer of the "dbt was unable to infer all dependencies for the model" error #dbt
@jeremyyeo
jeremyyeo / README.md
Last active December 9, 2023 22:44
How to get models to run only on local timezones if your scheduler only supports UTC #dbt

How to get models to run only on local timezones if your orchestration platform only supports UTC

If you're using an orchestration platform that only supports UTC timezones - you may find it difficult cumbersome to schedule jobs to run on local timezones, accounting for daylight savings time switchovers, etc. Let's see how we can write a dbt macro using some builtin python modules to help us out.

-- macros/is_nz_business_hours.sql
{% macro is_nz_business_hours() %}
    {% set flag = 0 %}
    {% set dt = modules.datetime %}
    {% set pz = modules.pytz %}
@jeremyyeo
jeremyyeo / README.md
Last active March 26, 2024 15:34
How to do multi-threaded / parallel operations with dbt #dbt

How to do multi-threaded / parallel operations with dbt

-!  🚨                                          WARNING                                          🚨  !-
Just because you could does not mean you should!

Examples below are using PostgreSQL however the same idea apply across adapters (Snowflake,etc).

In order to run arbitrary SQL commands with dbt - we would typically use hooks or operations. Imagine for a second, we have to run an operation that performs some action multiple times:

@jeremyyeo
jeremyyeo / README.md
Last active November 10, 2023 21:06
How to correctly use a macro that returns a value to a hook #dbt