Tristan Handy jthandy

Running vacuum and analyze in Sinter

dbt and Sinter have the ability to run regular Redshift maintenance jobs. It's great to set these up early on in a project so that things stay clean as the project grows, and implementing these jobs in Sinter allows the same easy transparency and notifications as with your other dbt jobs.

This document will go through the specific steps necessary to configure vacuum and analyze jobs in the current version of dbt and Sinter. In the future, there will likely be a more idiomatically consistent way to express this logic using native dbt operations. Currently, this does work even if it is not elegant.

Step 1: Create the macros

macros/redshift_maintenance.sql

	order_id	items
	1	[{"id":"fa4b6cd3-4719-4b97-848b-7f2025f5e693","quantity":1,"sku":"M900353-SWB-RYL-2","list_price":60.0},{"id":"c39f9474-a278-4162-9cfa-aa068f4e1665","quantity":1,"sku":"F033199-SWB-FWL-1","list_price":20.0}]
	2	[{"id":"fa4b6cd3-4719-4b97-848b-7f2025f5e693","quantity":1,"sku":"M900353-SWB-RYL-2","list_price":60.0}]

	with
	gains as (
	select date_trunc('month', signup_date) as month, count(*) total_gains
	from {{env.schema}}.mailchimp_members
	group by date_trunc('month', signup_date)
	),

	hard_bounces as
	(
	-- get the first hard bounce date for each email

	with email_summary as (

	select *
	from {{env.schema}}.mailchimp_email_summary

	), best_day_of_week as (

	select date_part(dow, sent_date), avg(opened::float)
	from email_summary
	group by 1

	with email_summary as (

	select *
	from {{env.schema}}.mailchimp_email_summary

	)

	select email_id, sum(clicked)::float / sum(opened)
	from email_summary
	group by 1

	select
	md5(
	coalesce(date_day::varchar, '') \|\|
	coalesce(destination_url, '') \|\|
	coalesce(utm_medium, '') \|\|
	coalesce(utm_source, '') \|\|
	coalesce(utm_campaign, '') \|\|
	coalesce(utm_term, '') \|\|
	coalesce(utm_content, '') \|\|
	coalesce(ad_group_id::varchar, '') \|\|

	select
	ad_performance.*,
	sessions.*
	from ad_performance
	left outer join sessions on ad_performance.id = sessions.ad_performance_id

	with discounts as (

	select * from {{ref('stripe_discounts')}}

	),

	invoice_items as (

	select * from {{ref('stripe_invoice_items')}}

	{% macro source_select_sql(relation, column_name) %}
	select {{column_name}} as f from {{relation}}
	{% endmacro %}


	{% macro single_pass_stats(relation, column_name) %}
	{%- call statement('single_pass_column_stats', fetch_result=True) -%}

	with source as ( {{ fishtown_internal_analytics.source_select_sql(relation, column_name) }} ),
	single_pass_stats as (

	create table dbt_jthandy.flatten_test (
	order_id int,
	json_text varchar(1000)
	)
	;

	insert into dbt_jthandy.flatten_test
	(order_id, json_text)
	values
	(1, '{