Skip to content

Instantly share code, notes, and snippets.

View hughdbrown's full-sized avatar

Hugh Brown hughdbrown

View GitHub Profile
@hughdbrown
hughdbrown / graphlab-create 1.8.3.txt
Created September 26, 2016 02:07
Coursera course requires version 1.8.3, which is wrong
(data2) C:\Users\hughdbrown>pip install graphlab-create==1.8
Collecting graphlab-create==1.8
Could not find a version that satisfies the requirement graphlab-create==1.8 (from versions: 2.1)
No matching distribution found for graphlab-create==1.8
(data2) C:\Users\hughdbrown>pip install graphlab-create==1.8.3
Collecting graphlab-create==1.8.3
Could not find a version that satisfies the requirement graphlab-create==1.8.3 (from versions: 2.1)
No matching distribution found for graphlab-create==1.8.3
@hughdbrown
hughdbrown / docker-run-pipeline.sh
Last active September 21, 2016 17:21
Shell script to run docker image for O'Reilly Kafka-Cassandra-Spark course
#!/usr/bin/env bash -e
export image_version="2.0.1"
export image_name="datafellas/distributed-pipeline-quotes:${image_version}"
sudo docker pull ${image_name}
sudo docker run --rm -it \
--memory=8g \
--cpuset-cpus="0-3" \
@hughdbrown
hughdbrown / coin_change_solutions.py
Created June 16, 2016 21:16
Calculate number of unique ways to make change with a set of coins
def partial_soln(solns, coins, target):
solns[target] = []
for i in coins:
x = target - i
if x in solns:
for xx in solns[x]:
new_item = tuple(sorted(list(xx) + [i]))
solns[target].append(new_item)
def solution(coins, target):
@hughdbrown
hughdbrown / sisense-lookup.md
Last active June 17, 2016 19:11
Using lookup tables with Sisense

Suppose you have these line-item and snapshot/lookup tables that you want to relate:

Contracts

ID,Type,Date,Amt
1,PO,1/20,$100
1,PO,2/20,$200
1,PO,2/23,$300
2,PO,2/10,$1000
2,PO,2/15,$2000
@hughdbrown
hughdbrown / unique.py
Created December 12, 2015 23:32
Ordered unique elements
# An occasional python interview question I have seen is:
# "How would you make this series unique while preserving order?"
# The standard code looks like this:
def unique2(series):
result = []
seen = set()
for s in series:
if s not in seen:
seen.add(s)
@hughdbrown
hughdbrown / cartesian.py
Created December 10, 2015 17:00
Generator for cartesian product
def cartesian(lol):
if not lol:
yield []
else:
left, right = lol[0], lol[1:]
for item in left:
for result in cartesian(right):
yield [item] + result
>>> data = [
@hughdbrown
hughdbrown / bogosort.py
Created December 4, 2015 18:27
bogosort
from itertools import tee
from random import shuffle
from datetime import datetime
def issorted(series):
s1, s2 = tee(series)
next(s2)
return all(elem1 <= elem2 for elem1, elem2 in zip(s1, s2))
@hughdbrown
hughdbrown / downsample.py
Last active November 6, 2015 17:01
How I downsample
import numpy as np
def downsample(data, labels):
"""
>>> data = np.arange(100)
>>> label = np.array([1] * 95 + [0] * 5)
>>> print downsample(data, label)
"""
zero_index = np.array([i for i, val in enumerate(labels) if val == 0])
one_index = np.array([i for i, val in enumerate(labels) if val == 1])
@hughdbrown
hughdbrown / spark-overview.md
Created October 9, 2015 18:28
Overview of spark features

Initialize spark in python

from pyspark import SparkConf, SparkContext

conf = SparkConf().setMaster("local").setAppName("My App")
sc = SparkContext(conf=conf)

Load data