Marta Paes morsapaes

Evolving the `dbt-materialize` adapter

Tracking issue: #10600

Some things to consider:

Although sources and sinks are inverse concepts, sources have a one-to-many relationship with downstream relations, while sinks have a one-to-one relationship with upstream relations. Relations have a zero-to-many relationship with downstream sinks, though, which gets in the way of implementing them as inverse dbt concepts (e.g. using pre- and post-hooks).

Something else to consider is that source and sink configuration might have different ownership than model development in the wild (e.g. data engineers vs. analytics engineers), so it'd be preferable not to tightly couple them.

SQL loops in dbt

One handy (and scary 👻) thing that Jinja adds on top of SQL is the ability to run for loops. Here's a rough example that shows how to simplify the statement Frank provided in models__staging__mqis_raw_fiscal_area_shift.sql using loops:

Create a .csv file with the sid_id to time_zone mapping (dim_plant_timezones.csv) and place it under a /seeds directory. This mapping can be directly embedded into the model, too, but the seed approach guarantees that the file can be checked into version control and possibly reused across models:

sid_id,time_zone
22,america/denver
23,america/chicago

(Source) Table schema evolution via dbt model versioning

In the context of Materialize #26881, walking through my understanding of how the proposed design would work to handle schema evolution using dbt-materialize.

OOTB

With the current proposal, we will automatically create a table (or multiple tables, for multi-output sources) the first time someone types CREATE SOURCE:
```
-- This creates a source `kafka_src` + a table `t`, which is what you will
```

	CREATE MATERIALIZED VIEW agg_tweets AS
	SELECT COUNT(tweet) AS total_tweets,
	username
	FROM twitter_tweets_enriched
	GROUP BY username;

	CREATE MATERIALIZED VIEW agg_users AS
	SELECT COUNT(twitter_id) AS total_tweets
	FROM twitter_tweets
	GROUP BY twitter_id;

	CREATE MATERIALIZED VIEW tweets_hourly AS
	SELECT
	date_bin(interval '1 hours', created_at, '2022-03-22') AS time_bucket,
	COUNT(tweet_id) AS total_tweets
	FROM twitter_tweets
	GROUP BY 1;

	SELECT *
	FROM twitter_tweets_enriched
	WHERE username IN (SELECT username FROM users_not_there)
	ORDER BY created_at DESC;

Marta Paes morsapaes

Evolving the dbt-materialize adapter

SQL loops in dbt

(Source) Table schema evolution via dbt model versioning

OOTB

Evolving the `dbt-materialize` adapter