Skip to content

Instantly share code, notes, and snippets.

View robcowie's full-sized avatar

Rob Cowie robcowie

  • Recycleye
  • Leeds/London, United Kingdom
View GitHub Profile
@robcowie
robcowie / postgresql_notes.md
Created April 6, 2019 15:32
Notes and tips for using Postgresql

PostgreSQL Notes

Column Names

All identifiers are coerced to lower-case so column names are effectively case-insensitive. To avoid this, quote the identifier with double quotes.

SELECT "myColA" FROM "camelCaseTable";
@robcowie
robcowie / config_arg_parser.py
Created February 18, 2019 10:00
ArgumentParser and argparse Action that can pull args from a yaml config file
# -*- coding: utf-8 -*-
"""
Part of the undertime app https://gitlab.com/anarcat/undertime by Antoine Beaupré.
AGPLv3 licence (https://gitlab.com/anarcat/undertime/blob/master/LICENSE)
"""
import argparse
import os
@robcowie
robcowie / ec2_operator.py
Created February 11, 2019 20:02
EC2 Airflow Operator
# -*- coding: utf-8 -*-
"""
NOTE THIS IS UNTESTED AS IT WAS NOT REQUIRED.
See:
- https://github.com/apache/airflow/blob/master/airflow/contrib/hooks/aws_hook.py
- https://github.com/apache/airflow/blob/master/airflow/contrib/operators/ecs_operator.py
- https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.ServiceResource.create_instances
- https://stackabuse.com/automating-aws-ec2-management-with-python-and-boto3/
@robcowie
robcowie / largest_partition.sh
Created February 8, 2019 13:32
Mounted partition with most free space
df | grep / | sort -k 4 -n -r | head -n 1 | awk '{print $6}'
@robcowie
robcowie / list_s3_with_metadata.py
Created January 28, 2019 12:54
List S3 with pagination and metadata
def list_s3_with_metadata(s3_conn, prefix):
"""List all keys at `prefix` and return metadata."""
bucket, prefix = prefix.split('://')[1].split('/', 1)
paginator = s3_conn.get_paginator('list_objects_v2')
response = paginator.paginate(Bucket=bucket, Prefix=prefix)
def attrs(d):
return {'Key': 's3://{}/{}'.format(bucket, d['Key']), 'ETag': d['ETag'].replace('"', ''), 'Size': d['Size']}
@robcowie
robcowie / .gitignore_global
Created November 27, 2018 11:44
Global gitignore for discussion
# Compiled source #
###################
*.com
*.class
*.dll
*.exe
*.o
*.so
*.pyc
*.cache
@robcowie
robcowie / boto3_emr_cluster_definition.py
Created November 21, 2018 11:21
EMR cluster definition for boto3
CLUSTER_DEFINITION = {
'Name': 'name',
'Instances': {
'InstanceGroups': [
{
'Name': 'Master',
'Market': 'SPOT',
'InstanceRole': 'MASTER',
'BidPrice': '1',
'InstanceType': 'r4.2xlarge',
@robcowie
robcowie / ip_anonymisation_bigquery.sql
Created July 19, 2018 15:25
Investigating IP anonymisation in Bigquery
#standardSQL
CREATE TEMPORARY FUNCTION anonIPToBytes(ip string) AS (
-- remove the last 8 bits of an IPv4 address (32 - 8 = 24)
NET.IP_TRUNC(NET.SAFE_IP_FROM_STRING(ip), 24)
-- TODO: how to distinguish v4 and v6?
-- remove the last 80 bits of an IPv6 address (128 - 80 = 48)
-- NET.IP_TRUNC(NET.SAFE_IP_FROM_STRING(ip), 48)
);
@robcowie
robcowie / bigquery_notes.md
Last active June 17, 2019 08:17
Biquery Notes

Biqquery Notes

Require a partition filter on an existing table

bq update --require_partition_filter --time_partitioning_field ts -t page_impressions.raw

Copy a table