Skip to content

Instantly share code, notes, and snippets.

View cameres's full-sized avatar

Connor Ameres cameres

View GitHub Profile
@jmindek
jmindek / gist:62c50dd766556b7b16d6
Last active January 31, 2024 15:48
DISTINCT ON like functionality for Redshift

distinct column -> For each row returned, return only the unique members of a set. Think of it as for each row in a projection, concatenate all the column values and return only the strings that are unique.

test_db=# SELECT DISTINCT parent_id, child_id, id FROM test.foo_table ORDER BY parent_id, child_id, id LIMIT 10;
parent_id | child_id | id
-----------+------------+-----------------------------
1000040 | 103 | 1000040|2645405726|0001|103
@Karthick333031
Karthick333031 / sqoop installation in emr
Created July 3, 2015 07:34
Installing SQOOP in Amazon EMR
Sqoop install steps in emr/hadoop cluster
cd ~
mkdir mysql sqoop
cd ~/sqoop/
wget http://supergsego.com/apache/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
tar xvfz sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
cd ~/mysql/
wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.36.zip
unzip mysql-connector-java-5.1.36.zip
cp ~/mysql/mysql-connector-java-5.1.36/mysql-connector-java-5.1.36-bin.jar ~/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib/
@andershammar
andershammar / matplotlib-zeppelin
Created July 1, 2015 07:42
Example showing how to use matplotlib from a Zeppelin notebook
%pyspark
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt
import StringIO
def show(p):
img = StringIO.StringIO()
p.savefig(img, format='svg')
@msukmanowsky
msukmanowsky / spark_gzip.py
Created November 14, 2014 01:32
Example of how to save Spark RDDs to disk using GZip compression in response to https://twitter.com/rjurney/status/533061960128929793.
from pyspark import SparkContext
def main():
sc = SparkContext(appName="Test Compression")
# RDD has to be key, value pairs
data = sc.parallelize([
("key1", "value1"),
("key2", "value2"),
("key3", "value3"),
@subelsky
subelsky / large_redshift_tables.sql
Created April 18, 2014 17:39
Quick SQL command to find large tables in redshift
-- based on http://stackoverflow.com/questions/21767780/how-to-find-size-of-database-schema-table-in-redshift
SELECT name AS table_name, ROUND((COUNT(*) / 1024.0),2) as "Size in Gigabytes"
FROM stv_blocklist
INNER JOIN
(SELECT DISTINCT id, name FROM stv_tbl_perm) names
ON names.id = stv_blocklist.tbl
GROUP BY name
ORDER BY "Size in Gigabytes" DESC
@rtt
rtt / tinder-api-documentation.md
Last active October 6, 2025 20:20
Tinder API Documentation

Tinder API documentation

Note: this was written in April/May 2014 and the API may has definitely changed since. I have nothing to do with Tinder, nor its API, and I do not offer any support for anything you may build on top of this. Proceed with caution

http://rsty.org/

I've sniffed most of the Tinder API to see how it works. You can use this to create bots (etc) very trivially. Some example python bot code is here -> https://gist.github.com/rtt/5a2e0cfa638c938cca59 (horribly quick and dirty, you've been warned!)

@soarez
soarez / ca.md
Last active July 27, 2025 23:20
How to setup your own CA with OpenSSL

How to setup your own CA with OpenSSL

For educational reasons I've decided to create my own CA. Here is what I learned.

First things first

Lets get some context first.

@mbostock
mbostock / .block
Last active April 19, 2025 08:19
The Gist to Clone All Gists
license: gpl-3.0
@walkermatt
walkermatt / debounce.py
Created June 4, 2012 21:44
A debounce function decorator in Python similar to the one in underscore.js, tested with 2.7
from threading import Timer
def debounce(wait):
""" Decorator that will postpone a functions
execution until after wait seconds
have elapsed since the last time it was invoked. """
def decorator(fn):
def debounced(*args, **kwargs):
def call_it():
@btoone
btoone / curl.md
Last active October 10, 2025 20:21
A curl tutorial using GitHub's API

Introduction

An introduction to curl using GitHub's API.

The Basics

Makes a basic GET request to the specifed URI

curl https://api.github.com/users/caspyin