Skip to content

Instantly share code, notes, and snippets.

@justinnaldzin
justinnaldzin / gcloud_storage_delete.sh
Created August 6, 2018 13:00
Google Cloud Storage delete operations
# Delete all files in bucket
BUCKET_NAME=my_bucket
gsutil -m rm gs://$BUCKET_NAME/**
# Delete bucket
gsutil rb -f gs://$BUCKET_NAME
@justinnaldzin
justinnaldzin / get_ec2_instance_name.py
Created August 27, 2018 15:19
Returns the EC2 instance name
import logging
import boto3
from botocore.exceptions import ClientError, BotoCoreError
import requests
from requests import RequestException
def get_instance_name():
try:
@justinnaldzin
justinnaldzin / get_folders_in_s3_bucket.py
Created August 31, 2018 14:14
List all folder names in S3 bucket under a prefix
import boto3
bucket_name = 'my_bucket_name'
prefix = 'some/path/'
print("Getting list of all folder names in S3 bucket {} under prefix {}".format(bucket_name, prefix))
folders_list = []
client = boto3.client('s3')
results = client.list_objects(Bucket=bucket_name, Prefix=prefix, Delimiter='/')
for folder in results.get('CommonPrefixes'):
@justinnaldzin
justinnaldzin / git_merge_master.sh
Created September 5, 2018 15:58
Merge the master branch into the current checked out branch
#!/bin/bash -x
# Merge the master branch into the current checked out branch
CURRENT=`git branch | grep "*" | awk '{print $2}'`
git checkout master
git fetch
git merge origin/master
git checkout ${CURRENT}
git merge master ${CURRENT}
@justinnaldzin
justinnaldzin / aws_utility.py
Created October 17, 2018 20:09
Listing objects and keys in an S3 bucket
import boto3
def get_matching_s3_objects(bucket, prefix='', suffix=''):
"""
Fetch objects in an S3 bucket.
:param bucket: Name of the S3 bucket.
:param prefix: Only fetch objects whose key starts with
this prefix (optional).
@justinnaldzin
justinnaldzin / aws_s3_unzip_files.sh
Created November 15, 2018 16:42
Copy zip files from S3 to local directory, unzip and upload to S3
# Copy zip files from S3 to local directory, unzip and upload to S3
aws s3 cp s3://bucket/folder/ . --recursive
for f in *.zip; do unzip $f; done
aws s3 cp . s3://bucket/folder/ --recursive --exclude "*.zip"

Load data from BigQuery

Using the BigQuery client library

pip install --upgrade google-cloud-bigquery
from google.cloud import bigquery
@justinnaldzin
justinnaldzin / google_cloud_composer_manually_trigger_dag.sh
Created February 12, 2019 02:51
Google Cloud Composer - Manually trigger DAG runs using Airflow v1.10+
# Google Cloud Composer - Manually trigger DAG runs using Airflow v1.10+
ENVIRONMENT_NAME=my-composer
LOCATION=us-east1
# Trigger DAG - individual
DAG_ID=my_daily_dag
EXEC_DATE=2019-02-11
gcloud composer environments run ${ENVIRONMENT_NAME} --location ${LOCATION} trigger_dag -- -r manual__${EXEC_DATE} -e ${EXEC_DATE} ${DAG_ID}
# Trigger DAG - multiple

Unique ID column

Generate a unique identifier that consistently produces the same result each time based on the values in the row. The ID column will be the first column positioned in the DataFrame.

from pyspark.sql.functions import sha2, concat_ws

columns = df.columns
df = df.withColumn(id_col, sha2(concat_ws("||", *df.columns), 256))
df = df.select([id_col] + columns)
@justinnaldzin
justinnaldzin / gcp_kms_encrypt_decrypt.py
Created February 15, 2019 21:05
GCP Cloud KMS encrypting and decrypting data
from google.cloud import kms_v1
def encrypt(project_id, location_id, key_ring_id, crypto_key_id, plaintext):
"""Encrypts input plaintext data using the provided symmetric CryptoKey."""
# Creates an API client for the KMS API.
client = kms_v1.KeyManagementServiceClient()
# The resource name of the CryptoKey.