Skip to content

Instantly share code, notes, and snippets.

@jeremyyeo
Last active August 8, 2024 03:00
Show Gist options
  • Save jeremyyeo/617e8c8e6549bd8d0ae0814a0273bae4 to your computer and use it in GitHub Desktop.
Save jeremyyeo/617e8c8e6549bd8d0ae0814a0273bae4 to your computer and use it in GitHub Desktop.
Converting a dbt jinja macro to a python function for use in a python model #dbt

Converting a dbt jinja macro to a python function for use in a python model

Update: See this article on sharing Python functions across models https://github.com/jeremyyeo/the-hitchhikers-guide-to-dbt/tree/main/snowflake-python-models-shared-functions

Here's a quick example of converting a dbt jinja macro used in a sql model into a python function that is used in a python model instead. It is currently not possible to use a jinja macro as is in a python model.

# dbt_project.yml
name: my_dbt_project
profile: all
config-version: 2
version: "1.0.0"
models:
  my_dbt_project:
    +materialized: table
-- models/foo.sql
select 1 as v
union
select 2 as v

-- models/bar.sql
select {{ modify_column('v') }} as v_modified
  from {{ ref('foo') }}

-- macros/modify_column.sql
{% macro modify_column(column_name) -%}
    concat('$', {{ column_name }})
{%- endmacro %}

^ The business logic here is that we have a model bar that selects from foo - but we want to add a dollar sign $ to the front of each value and we use a jinja macro modify_column for that. As you can tell the jinja macro just returns a templated string with the SQL function concat.

$ dbt run
00:54:19  Running with dbt=1.7.11
00:54:21  Registered adapter: snowflake=1.7.2
00:54:22  Found 2 models, 0 sources, 0 exposures, 0 metrics, 431 macros, 0 groups, 0 semantic models
00:54:22  
00:54:25  Concurrency: 4 threads (target='sf')
00:54:25  
00:54:25  1 of 2 START sql table model dbt_jyeo.foo ...................................... [RUN]
00:54:27  1 of 2 OK created sql table model dbt_jyeo.foo ................................. [SUCCESS 1 in 2.35s]
00:54:27  2 of 2 START sql table model dbt_jyeo.bar ...................................... [RUN]
00:54:29  2 of 2 OK created sql table model dbt_jyeo.bar ................................. [SUCCESS 1 in 1.95s]
00:54:29  
00:54:29  Finished running 2 table models in 0 hours 0 minutes and 7.49 seconds (7.49s).
00:54:29  
00:54:29  Completed successfully
00:54:29  
00:54:29  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

$ dbt show --inline 'select * from bar'
00:55:45  Running with dbt=1.7.11
00:55:47  Registered adapter: snowflake=1.7.2
00:55:48  Found 2 models, 0 sources, 0 exposures, 0 metrics, 431 macros, 0 groups, 0 semantic models
00:55:48  
00:55:50  Concurrency: 4 threads (target='sf')
00:55:50  
00:55:52  Previewing inline node:
| V_MODIFIED |
| ---------- |
| $1         |
| $2         |

Let's see how we can do this in a python model instead:

# models/baz.py
def modify_column(x):
    return '$' + str(x)

def model(dbt, session):
    foo_df = dbt.ref("foo").to_pandas()
    foo_df["V_MODIFIED"] = foo_df["V"].apply(modify_column)
    baz_df = foo_df[["V_MODIFIED"]]
    return baz_df

^ The modify_column function here effectively does the same thing as our jinja macro in our sql model bar above.

$ dbt run -s baz
01:12:59  Running with dbt=1.7.11
01:13:00  Registered adapter: snowflake=1.7.2
01:13:01  Found 3 models, 0 sources, 0 exposures, 0 metrics, 431 macros, 0 groups, 0 semantic models
01:13:01  
01:13:04  Concurrency: 4 threads (target='sf')
01:13:04  
01:13:04  1 of 1 START python table model dbt_jyeo.baz ................................... [RUN]
01:13:09  1 of 1 OK created python table model dbt_jyeo.baz .............................. [SUCCESS 1 in 4.94s]
01:13:09  
01:13:09  Finished running 1 table model in 0 hours 0 minutes and 8.04 seconds (8.04s).
01:13:09  
01:13:09  Completed successfully
01:13:09  
01:13:09  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

$ dbt show --inline 'select * from baz'
01:15:14  Running with dbt=1.7.11
01:15:15  Registered adapter: snowflake=1.7.2
01:15:17  Found 3 models, 0 sources, 0 exposures, 0 metrics, 431 macros, 0 groups, 0 semantic models
01:15:17  
01:15:18  Concurrency: 4 threads (target='sf')
01:15:18  
01:15:20  Previewing inline node:
| V_MODIFIED |
| ---------- |
| $1         |
| $2         |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment