Skip to content

Instantly share code, notes, and snippets.

View szeitlin's full-sized avatar

Sam Zeitlin szeitlin

View GitHub Profile
#!/bin/bash
//usr/bin/env groovy -cp "$(dirname "$0")" "$0" $@; exit $?
if (args.size() == 0) {
println 'missing root dir'
System.exit(1)
}
def linter = new AvroSchemaLinter()
try {
@sagivo
sagivo / bulk-upsert-from-temporary-table.sql
Created April 18, 2016 19:05 — forked from seanbehan/bulk-upsert-from-temporary-table.sql
Perform an "upsert" from CSV file using Postgres copy command #sql #psql
create temporary table temp (symbol varchar(255), open decimal, high decimal, low decimal, close decimal, volume varchar(255), date date );
create table if not exists stocks (id serial primary key, symbol varchar(255), open decimal, high decimal, low decimal, close decimal, volume varchar(255), date date, created_at timestamp, updated_at timestamp);
copy temp (symbol, date, open, high, low, close, volume) from '/path/to/file.csv' with delimiter ',' csv header;
delete from stocks using temp where stocks.date = temp.date and stocks.symbol = temp.symbol;
insert into stocks (symbol, open, high, low, close, volume, date) select symbol, open, high, low, close, volume, date from temp;
@aws-scripting-guy
aws-scripting-guy / gist:884ffa9d44bd14f7493a670543284552
Created April 2, 2016 18:33
AWS EC2 metadata. Check attached IAM role from EC2 instance. Get temporary credentials.
# Get IAM Role name from Instance Profile Id
curl http://169.254.169.254/latest/meta-data/iam/info
# Get credentials
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/<role-name>
# More info
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html
@bastman
bastman / docker-cleanup-resources.md
Created March 31, 2016 05:55
docker cleanup guide: containers, images, volumes, networks

Docker - How to cleanup (unused) resources

Once in a while, you may need to cleanup resources (containers, volumes, images, networks) ...

delete volumes

// see: https://github.com/chadoe/docker-cleanup-volumes

$ docker volume rm $(docker volume ls -qf dangling=true)

$ docker volume ls -qf dangling=true | xargs -r docker volume rm

@joshlk
joshlk / faster_toPandas.py
Last active July 22, 2024 14:15
PySpark faster toPandas using mapPartitions
import pandas as pd
def _map_to_pandas(rdds):
""" Needs to be here due to pickling issues """
return [pd.DataFrame(list(rdds))]
def toPandas(df, n_partitions=None):
"""
Returns the contents of `df` as a local `pandas.DataFrame` in a speedy fashion. The DataFrame is
repartitioned if `n_partitions` is passed.
@tomron
tomron / spark_aws_lambda.py
Created February 27, 2016 12:57
Example of python code to submit spark process as an emr step to AWS emr cluster in AWS lambda function
import sys
import time
import boto3
def lambda_handler(event, context):
conn = boto3.client("emr")
# chooses the first cluster which is Running or Waiting
# possibly can also choose by name or already have the cluster id
clusters = conn.list_clusters()
@vasanthk
vasanthk / System Design.md
Last active April 21, 2025 02:58
System Design Cheatsheet

System Design Cheatsheet

Picking the right architecture = Picking the right battles + Managing trade-offs

Basic Steps

  1. Clarify and agree on the scope of the system
  • User cases (description of sequences of events that, taken together, lead to a system doing something useful)
    • Who is going to use it?
    • How are they going to use it?
from botocore.credentials import RefreshableCredentials
from botocore.session import get_session
from boto3 import Session
def assumed_session(role_arn, session_name, session=None):
"""STS Role assume a boto3.Session
With automatic credential renewal.
@jfrazee
jfrazee / FilterBadGzipFiles.scala
Last active July 24, 2024 20:34
Spark job to read gzip files, ignoring corrupted files
import java.io._
import scala.io._
import java.util.zip._
// Spark
import org.slf4j.Logger
import org.apache.spark.{ SparkConf, SparkContext, Logging }
// Hadoop
import org.apache.hadoop.io.compress.GzipCodec
@Justasic
Justasic / openvpn_gen.py
Created November 8, 2015 06:24
This is a python script to generate client OpenVPN configuration files. This is based mostly on the easyrsa script and is much simpler to understand.
import os
import socket
from OpenSSL import crypto, SSL
# OpenVPN is fairly simple since it works on OpenSSL. The OpenVPN server contains
# a root certificate authority that can sign sub-certificates. The certificates
# have very little or no information on who they belong to besides a filename
# and any required information. Everything else is omitted or blank.
# The client certificate and private key are inserted into the .ovpn file
# which contains some settins as well and the entire thing is then ready for