Skip to content

Instantly share code, notes, and snippets.

View morsapaes's full-sized avatar
👹

Marta Paes morsapaes

👹
View GitHub Profile
CREATE MATERIALIZED VIEW agg_tweets AS
SELECT COUNT(tweet) AS total_tweets,
username
FROM twitter_tweets_enriched
GROUP BY username;
CREATE MATERIALIZED VIEW agg_users AS
SELECT COUNT(twitter_id) AS total_tweets
FROM twitter_tweets
GROUP BY twitter_id;
CREATE MATERIALIZED VIEW tweets_hourly AS
SELECT
date_bin(interval '1 hours', created_at, '2022-03-22') AS time_bucket,
COUNT(tweet_id) AS total_tweets
FROM twitter_tweets
GROUP BY 1;
SELECT *
FROM twitter_tweets_enriched
WHERE username IN (SELECT username FROM users_not_there)
ORDER BY created_at DESC;

Evolving the dbt-materialize adapter

Tracking issue: #10600

Some things to consider:

Although sources and sinks are inverse concepts, sources have a one-to-many relationship with downstream relations, while sinks have a one-to-one relationship with upstream relations. Relations have a zero-to-many relationship with downstream sinks, though, which gets in the way of implementing them as inverse dbt concepts (e.g. using pre- and post-hooks).

Something else to consider is that source and sink configuration might have different ownership than model development in the wild (e.g. data engineers vs. analytics engineers), so it'd be preferable not to tightly couple them.

SQL loops in dbt

One handy (and scary 👻) thing that Jinja adds on top of SQL is the ability to run for loops. Here's a rough example that shows how to simplify the statement Frank provided in models__staging__mqis_raw_fiscal_area_shift.sql using loops:

  1. Create a .csv file with the sid_id to time_zone mapping (dim_plant_timezones.csv) and place it under a /seeds directory. This mapping can be directly embedded into the model, too, but the seed approach guarantees that the file can be checked into version control and possibly reused across models:
sid_id,time_zone
22,america/denver
23,america/chicago

(Source) Table schema evolution via dbt model versioning

In the context of Materialize #26881, walking through my understanding of how the proposed design would work to handle schema evolution using dbt-materialize.

OOTB

  1. With the current proposal, we will automatically create a table (or multiple tables, for multi-output sources) the first time someone types CREATE SOURCE:

    -- This creates a source `kafka_src` + a table `t`, which is what you will