Skip to content

Instantly share code, notes, and snippets.

View robert8138's full-sized avatar

Robert Chang robert8138

  • Airbnb
  • San Francisco
View GitHub Profile
@robert8138
robert8138 / toy_DAG_definition_file.py
Created January 21, 2018 01:46
A toy example of a DAG definition file in Airflow
"""
A DAG docstring might be a good way to explain at a high level
what problem space the DAG is looking at.
Links to design documents, upstream dependencies etc
are highly recommended.
"""
from datetime import datetime, timedelta
from airflow.models import DAG # Import the DAG class
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
@robert8138
robert8138 / toy_DAG_definition_file.py
Last active December 26, 2019 14:50
A toy example of a DAG definition file in Airflow
"""
A DAG docstring might be a good way to explain at a high level
what problem space the DAG is looking at.
Links to design documents, upstream dependencies etc
are highly recommended.
"""
from datetime import datetime, timedelta
from airflow.models import DAG # Import the DAG class
from airflow.operators.sensors import NamedHivePartitionSensor
from airflow.operators.hive_operator import HiveOperator
@robert8138
robert8138 / toy_named_hive_partition_sensor.py
Created January 21, 2018 01:58
An example of a NamedHivePartitionSensor
NamedHivePartitionSensor(
task_id='waiting_for_my_awesome_hive_table',
partition_names=['robert.fct_bitcoin_revenue'],
dag=dag
)
@robert8138
robert8138 / toy_hive_operator_example.py
Created January 21, 2018 02:01
A toy example of hive operator
template = """
SELECT
SUM(m_bitcoin_trade_revenue) AS m_revenue
FROM
robert.fct_bitcoin_revenue
WHERE
ds = '{{ ds }}'
"""
HiveOperator(
@robert8138
robert8138 / toy_example_hive_to_mysql.py
Created January 21, 2018 02:05
A toy example to transfer data from Hive to MySQL
HiveToMySqlTransfer(
task_id='data_to_mysql_transfer',
sql="SELECT * FROM robert.dim_bitcoin_revenue WHERE ds = '{{ ds }}'",
mysql_conn_id='robert_mysql',
mysql_table='dim_bitcoin_revenue',
hiveserver2_conn_id='hiveserver2_silver',
dag=dag)
@robert8138
robert8138 / fct_and_dim_tables_in_action.sql
Last active February 24, 2020 13:31
Fact and Dimension Tables in Actions
SELECT
b.dim_market
, SUM(a.m_bookings) AS m_bookings
FROM (
SELECT
id_listing
, 1 AS m_bookings
, m_a # not used (for illustration only)
, m_b # not used (for illustration only)
, m_c # not used (for illustration only)
@robert8138
robert8138 / non_dynamic_partition.sql
Last active February 6, 2018 16:56
Dynamic Partitions Start
INSERT OVERWRITE TABLE bookings_summary PARTITION (ds= '{{ earliest_ds }}')
SELECT
dim_market
, SUM(m_bookings) AS m_bookings
FROM
fct_bookings
WHERE
ds = '{{ earliest_ds }}'
GROUP BY
dim_market
@robert8138
robert8138 / non_dynamic_partition_end.sql
Created February 6, 2018 06:13
Dynamic Partition Example 2
INSERT OVERWRITE TABLE fct_bookings PARTITION (ds='{{ latest_ds }}')
SELECT
id_listing
, m_bookings
FROM
some_up_stream_booking_tables
WHERE
ds = '{{ latest_ds }}'
;
@robert8138
robert8138 / dynamic_partition.sql
Last active February 6, 2018 16:59
An example of dynamic partitions
INSERT OVERWRITE TABLE bookings_summary PARTITION (ds)
SELECT
dim_market
, SUM(m_bookings) AS m_bookings
, ds # For Hive to know we are using dynamic partitions
FROM
fct_bookings
WHERE
ds BETWEEN '{{ earliest_ds }}' AND '{{ latest_ds }}'
GROUP BY
@robert8138
robert8138 / partition_by_ds.sql
Created February 6, 2018 06:39
A table partitioned by datestamp
Column | Type | Extra | Comment
--------------------+---------+-----------------+-------------------------------------------
id_listing | bigint | | Unique id of the listing.
id_host | bigint | | Unique id of the host for the listing
dim_market | varchar | | The market in which the listing is located
ds | varchar | partition key |
(4 rows)