Update: See this article on sharing Python functions across models https://github.com/jeremyyeo/the-hitchhikers-guide-to-dbt/tree/main/snowflake-python-models-shared-functions
Here's a quick example of converting a dbt jinja macro used in a sql model into a python function that is used in a python model instead. It is currently not possible to use a jinja macro as is in a python model.
# dbt_project.yml
name: my_dbt_project
profile: all
config-version: 2
version: "1.0.0"
models:
my_dbt_project:
+materialized: table
-- models/foo.sql
select 1 as v
union
select 2 as v
-- models/bar.sql
select {{ modify_column('v') }} as v_modified
from {{ ref('foo') }}
-- macros/modify_column.sql
{% macro modify_column(column_name) -%}
concat('$', {{ column_name }})
{%- endmacro %}
^ The business logic here is that we have a model bar
that selects from foo
- but we want to add a dollar sign $
to the front of each value and we use a jinja macro modify_column
for that. As you can tell the jinja macro just returns a templated string with the SQL function concat
.
$ dbt run
00:54:19 Running with dbt=1.7.11
00:54:21 Registered adapter: snowflake=1.7.2
00:54:22 Found 2 models, 0 sources, 0 exposures, 0 metrics, 431 macros, 0 groups, 0 semantic models
00:54:22
00:54:25 Concurrency: 4 threads (target='sf')
00:54:25
00:54:25 1 of 2 START sql table model dbt_jyeo.foo ...................................... [RUN]
00:54:27 1 of 2 OK created sql table model dbt_jyeo.foo ................................. [SUCCESS 1 in 2.35s]
00:54:27 2 of 2 START sql table model dbt_jyeo.bar ...................................... [RUN]
00:54:29 2 of 2 OK created sql table model dbt_jyeo.bar ................................. [SUCCESS 1 in 1.95s]
00:54:29
00:54:29 Finished running 2 table models in 0 hours 0 minutes and 7.49 seconds (7.49s).
00:54:29
00:54:29 Completed successfully
00:54:29
00:54:29 Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
$ dbt show --inline 'select * from bar'
00:55:45 Running with dbt=1.7.11
00:55:47 Registered adapter: snowflake=1.7.2
00:55:48 Found 2 models, 0 sources, 0 exposures, 0 metrics, 431 macros, 0 groups, 0 semantic models
00:55:48
00:55:50 Concurrency: 4 threads (target='sf')
00:55:50
00:55:52 Previewing inline node:
| V_MODIFIED |
| ---------- |
| $1 |
| $2 |
Let's see how we can do this in a python model instead:
# models/baz.py
def modify_column(x):
return '$' + str(x)
def model(dbt, session):
foo_df = dbt.ref("foo").to_pandas()
foo_df["V_MODIFIED"] = foo_df["V"].apply(modify_column)
baz_df = foo_df[["V_MODIFIED"]]
return baz_df
^ The modify_column
function here effectively does the same thing as our jinja macro in our sql model bar
above.
$ dbt run -s baz
01:12:59 Running with dbt=1.7.11
01:13:00 Registered adapter: snowflake=1.7.2
01:13:01 Found 3 models, 0 sources, 0 exposures, 0 metrics, 431 macros, 0 groups, 0 semantic models
01:13:01
01:13:04 Concurrency: 4 threads (target='sf')
01:13:04
01:13:04 1 of 1 START python table model dbt_jyeo.baz ................................... [RUN]
01:13:09 1 of 1 OK created python table model dbt_jyeo.baz .............................. [SUCCESS 1 in 4.94s]
01:13:09
01:13:09 Finished running 1 table model in 0 hours 0 minutes and 8.04 seconds (8.04s).
01:13:09
01:13:09 Completed successfully
01:13:09
01:13:09 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
$ dbt show --inline 'select * from baz'
01:15:14 Running with dbt=1.7.11
01:15:15 Registered adapter: snowflake=1.7.2
01:15:17 Found 3 models, 0 sources, 0 exposures, 0 metrics, 431 macros, 0 groups, 0 semantic models
01:15:17
01:15:18 Concurrency: 4 threads (target='sf')
01:15:18
01:15:20 Previewing inline node:
| V_MODIFIED |
| ---------- |
| $1 |
| $2 |