Skip to content

Instantly share code, notes, and snippets.

View robert8138's full-sized avatar

Robert Chang robert8138

  • Airbnb
  • San Francisco
View GitHub Profile
@robert8138
robert8138 / incremental_load.sql
Last active July 11, 2022 12:08
Demonstrate how to do incremental load
-- Not Recommended Approach: Scan the entire table and rebuild everyday
INSERT OVERWRITE TABLE dim_total_bookings PARTITION (ds = '{{ ds }}')
SELECT
dim_market
, SUM(m_bookings) AS m_bookings
FROM
fct_bookings
WHERE
ds <= '{{ ds }}' -- this is expensive, and can quickly run into scale issue
GROUP BY
@robert8138
robert8138 / paremterized_workflow.sql
Last active June 8, 2020 23:44
Parameterize workflow to incorporate backfill logic
{%- if backfill %}
INSERT OVERWRITE TABLE bookings_summary PARTITION (ds)
{%- else %}
INSERT OVERWRITE TABLE bookings_summary PARTITION (ds = '{{ ds }}')
{%- endif %}
SELECT
dim_market
, SUM(m_bookings) AS m_bookings
{%- if backfill %}
, ds
CREATE TABLE IF NOT EXISTS fct_bookings (
id_listing BIGINT COMMENT 'Unique ID of the listing'
, id_host BIGINT COMMENT 'Unique ID of the host who owns the listing'
, m_bookings BIGINT COMMENT 'Denoted 1 if a booking transaction occurred'
)
PARTITION BY ( -- this is how we define partition keys
ds STRING
);
# Define the CREATE TABLE statement here
{%- macro create_table() %}
...
{%- endmacro %}
# Main ETL logic, insert the results into a STAGING table
{%- macro main() %}
...
{%- endmacro %}
@robert8138
robert8138 / view_count_youtube_api.py
Last active June 10, 2020 18:50
Use YouTube API to get view trends of popular MOOC course
import os
import pandas as pd
import google_auth_oauthlib.flow
import googleapiclient.discovery
import googleapiclient.errors
import matplotlib
scopes = ["https://www.googleapis.com/auth/youtube.readonly"]