Skip to content

Instantly share code, notes, and snippets.

@Geremie
Geremie / using_bigquery_operator.py
Created October 31, 2020 19:50
Automate your Cloud SQL data synchronization to BigQuery with Airflow
aggregate_tables_task = BigQueryOperator(
task_id='aggregate_tables',
sql="""SELECT
firstName,
lastName,
total_amount
FROM (
SELECT
e.employeeNumber,
ROUND(SUM(amount)) AS total_amount
@Geremie
Geremie / using_google_cloud_storage_to_bq_operator.py
Created October 31, 2020 19:47
Automate your Cloud SQL data synchronization to BigQuery with Airflow
import_task = GoogleCloudStorageToBigQueryOperator(
task_id='{}_to_bigquery'.format(table_config.params['export_table']),
bucket=table_config.params['export_bucket'],
source_objects=['cloudsql-to-bigquery/{}/{}*'.format(table_config.params['export_table'],
table_config.params['export_table'])],
destination_project_dataset_table='{}.{}.{}'.format(table_config.params['gcp_project'],
table_config.params['stage_dataset'],
table_config.params['stage_table']),
schema_object="cloudsql-to-bigquery/schema/{}/schema_raw".format(table_config.params['export_table']),
write_disposition='WRITE_TRUNCATE',
@Geremie
Geremie / using_mysql_to_google_cloud_storage_operator.py
Created October 31, 2020 19:45
Automate your Cloud SQL data synchronization to BigQuery with Airflow
export_task = MySqlToGoogleCloudStorageOperator(
task_id='export_{}'.format(table_config.params['export_table']),
dag=dag,
sql=table_config.params['export_query'],
bucket=table_config.params['export_bucket'],
filename='cloudsql-to-bigquery/{}/{}'.format(table_config.params['export_table'],
table_config.params['export_table']) + '_{}',
schema_filename='cloudsql-to-bigquery/schema/{}/schema_raw'.format(table_config.params['export_table']),
mysql_conn_id='cloud_sql_proxy_conn')
@Geremie
Geremie / create_database.sh
Created October 31, 2020 19:39
Automate your Cloud SQL data synchronization to BigQuery with Airflow
bq --location=EU mk --dataset <project_id>:classicmodels
@Geremie
Geremie / create_pod.sh
Created October 31, 2020 19:34
Automate your Cloud SQL data synchronization to BigQuery with Airflow
kubectl apply -f pod.yaml
@Geremie
Geremie / create_service.sh
Created October 31, 2020 19:33
Automate your Cloud SQL data synchronization to BigQuery with Airflow
kubectl apply -f service.yaml
@Geremie
Geremie / create_namespace.sh
Created October 31, 2020 19:24
Automate your Cloud SQL data synchronization to BigQuery with Airflow
kubectl apply -f namespace.yaml
@Geremie
Geremie / whitelist_shell_ip.sh
Last active November 1, 2020 16:29
Automate your Cloud SQL data synchronization to BigQuery with Airflow
gcloud container clusters update <cluster-name> --zone europe-west1-b --enable-master-authorized-networks --master-authorized-networks <shell_server_ip_address>/32
@Geremie
Geremie / get_shell_address.sh
Created October 31, 2020 19:14
Automate your Cloud SQL data synchronization to BigQuery with Airflow
dig +short myip.opendns.com @resolver1.opendns.com
@Geremie
Geremie / configure_kubectl.sh
Last active November 1, 2020 16:25
Automate your Cloud SQL data synchronization to BigQuery with Airflow
gcloud container clusters get-credentials <cluster_name> --zone=europe-west1-b