Skip to content

Instantly share code, notes, and snippets.

@morsapaes
Last active July 19, 2023 03:43
Show Gist options
  • Save morsapaes/08f595e571ad81e61c1a87f352a1d4f8 to your computer and use it in GitHub Desktop.
Save morsapaes/08f595e571ad81e61c1a87f352a1d4f8 to your computer and use it in GitHub Desktop.

SQL loops in dbt

One handy (and scary 👻) thing that Jinja adds on top of SQL is the ability to run for loops. Here's a rough example that shows how to simplify the statement Frank provided in models__staging__mqis_raw_fiscal_area_shift.sql using loops:

  1. Create a .csv file with the sid_id to time_zone mapping (dim_plant_timezones.csv) and place it under a /seeds directory. This mapping can be directly embedded into the model, too, but the seed approach guarantees that the file can be checked into version control and possibly reused across models:
sid_id,time_zone
22,america/denver
23,america/chicago
24,america/new_york
26,america/chicago
27,america/new_york
29,america/chicago
31,america/chicago
34,america/chicago
40,america/new_york
47,america/new_york
60,america/chicago
70,america/chicago
77,america/chicago
80,america/denver
82,america/chicago
84,america/chicago
91,america/new_york
92,america/chicago
96,america/los_angeles
99,america/new_york
114,america/chicago
2045,america/chicago
2048,america/new_york
2049,america/chicago
2050,america/chicago
2051,america/chicago
  1. Create a sample table in Materialize and insert some data into it:
CREATE TABLE sample_data 
( 
  sid_id text, 
  name text, 
  fq_beg_dt timestamp
);

INSERT INTO sample_data VALUES (22,'Plant 1 DNV', now()), (96,'Plant 1 LA', now()), (24,'Plant 1 NY', now());
  1. Create a model (test_caseloop.sql) and place it under /models. Use the seed to populate the case statement:
{%- set case_data -%}
select sid_id, time_zone
from {{ ref('dim_plant_timezones') }}
{%- endset -%}

{%- set results = run_query(case_data) -%}

{%- if execute -%}
select sid_id,
       name,
       case
    {%- for r in results %}
          when sid_id='{{ r["sid_id"] }}' then fq_beg_dt::timestamp at time zone '{{ r["time_zone"] }}'
    {%- if loop.last -%}
     end as fy_beg_dt
    {%- endif -%}
    {% endfor %}
from sample_data
{% endif %}
  1. Run dbt build and check the compiled SQL under /target/compiled/.../models/test_caseloop.sql. It should look like:
select sid_id,
     name,
     case
        when sid_id='22' then fq_beg_dt::timestamp at time zone 'america/denver'
        when sid_id='80' then fq_beg_dt::timestamp at time zone 'america/denver'
        when sid_id='2049' then fq_beg_dt::timestamp at time zone 'america/chicago'
        when sid_id='2050' then fq_beg_dt::timestamp at time zone 'america/chicago'
        when sid_id='2051' then fq_beg_dt::timestamp at time zone 'america/chicago'
        when sid_id='23' then fq_beg_dt::timestamp at time zone 'america/chicago'
        when sid_id='26' then fq_beg_dt::timestamp at time zone 'america/chicago'
        when sid_id='29' then fq_beg_dt::timestamp at time zone 'america/chicago'
        when sid_id='31' then fq_beg_dt::timestamp at time zone 'america/chicago'
        when sid_id='34' then fq_beg_dt::timestamp at time zone 'america/chicago'
        when sid_id='60' then fq_beg_dt::timestamp at time zone 'america/chicago'
        when sid_id='70' then fq_beg_dt::timestamp at time zone 'america/chicago'
        when sid_id='77' then fq_beg_dt::timestamp at time zone 'america/chicago'
        when sid_id='82' then fq_beg_dt::timestamp at time zone 'america/chicago'
        when sid_id='84' then fq_beg_dt::timestamp at time zone 'america/chicago'
        when sid_id='92' then fq_beg_dt::timestamp at time zone 'america/chicago'
        when sid_id='114' then fq_beg_dt::timestamp at time zone 'america/chicago'
        when sid_id='2045' then fq_beg_dt::timestamp at time zone 'america/chicago'
        when sid_id='2048' then fq_beg_dt::timestamp at time zone 'america/new_york'
        when sid_id='24' then fq_beg_dt::timestamp at time zone 'america/new_york'
        when sid_id='27' then fq_beg_dt::timestamp at time zone 'america/new_york'
        when sid_id='40' then fq_beg_dt::timestamp at time zone 'america/new_york'
        when sid_id='47' then fq_beg_dt::timestamp at time zone 'america/new_york'
        when sid_id='91' then fq_beg_dt::timestamp at time zone 'america/new_york'
        when sid_id='99' then fq_beg_dt::timestamp at time zone 'america/new_york'
        when sid_id='96' then fq_beg_dt::timestamp at time zone 'america/los_angeles'end as fy_beg_dt
from sample_data
  1. Verify that the model was created in Materialize, and that the data looks correct:
materialize=> SELECT * FROM test_caseloop;
 sid_id |    name     |         fy_beg_dt
--------+-------------+----------------------------
 24     | Plant 1 NY  | 2023-07-19 06:45:56.488+00
 96     | Plant 1 LA  | 2023-07-19 09:45:56.488+00
 22     | Plant 1 DNV | 2023-07-19 08:45:56.488+00
(3 rows)

Need to run for my flight, but it should be trivial to extend this to also iterate over the different timestamp fields Frank needs to generate (that's...maybe one level of debugging insanity too deep, if you ask me; but possible).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment