Skip to content

Instantly share code, notes, and snippets.

View alaiacano's full-sized avatar

Adam Laiacano alaiacano

  • Spotify
  • New Rochelle, NY
View GitHub Profile
#!/usr/bin/python
"""
python resize_files.py folder_of_input_images output_file.gif
"""
import os, sys, re
RESIZE_PERCENT = 12
QUALITY = 40
folder = sys.argv[1]
@alaiacano
alaiacano / tumblr.html
Created February 19, 2013 18:41
A mostly-complete Tumblr post template. The input is a post JSON object from the Tumblr API. I think the Audio type is missing still.
<style>
.quote {
border: 1px solid #999;
page-break-inside: avoid;
padding: 4px;
background-color: #333333;
color: #eee;
}
img {
display: block;
@alaiacano
alaiacano / build_report.sh
Last active December 14, 2015 23:09
An example data workflow shell script.
# get rid of old results
rm -f *.csv
# run a hive query to get some action counts for each user.
hive -e "SELECT user, count(*) as actions FROM post_table WHERE dt=${YESTERDAY_DATE} GROUP BY user" > archive_data.csv
# grab today's data from scribe server
scp scribe:/var/log/scribe/post_table/*_current raw_data_${TODAY_DATE}.csv
# parse today's data into the same format (groupby/count) and append it to the archive
import numpy as np
def entropy(x):
return -1. * sum([p * np.log2(p) for p in x if p > 0])
def conditional_entropy(x, axis=0):
if not axis in [0, 1]:
raise Exception("Axis must be 0 or 1")
@alaiacano
alaiacano / diversity.scala
Last active December 16, 2015 02:59
scalding function to score the diversity of every county in USA for 2011. See https://github.com/krishnanraman/bigdata for context.
/*
This method is a member of the PopulationStats class at
https://github.com/krishnanraman/bigdata/blob/master/PopulationStats.scala
*/
def diversity(people: Pipe, fipspipe: Pipe) {
// we'll need the natural log of 2 in order to get log base 2 later on
val log2 = scala.math.log(2)
val attrPipe = people
# input data:
# a,1
# b,2
# a,3
# a,4
# a,7
# b,4
# b,0
x = [1 2 3]
# julia> toeplitz(x)
# 3x3 Array{Int64,2}:
# 1 2 3
# 2 1 2
# 3 2 1
function toeplitz{T}(x::Array{T})
n = length(x)
A = zeros(T, n, n)
def average_active_per_day():
last_month = (datetime.date.today() - datetime.timedelta(30)).strftime("%Y-%m-%d")
hive_query = """
SELECT day,
AVERAGE(unique_users)
FROM (SELECT day,
Distinct(user_id) AS unique_users
FROM activity_log
WHERE dt >= %s
GROUP BY day) x
@alaiacano
alaiacano / matrixproduct.diff
Last active December 31, 2015 03:59
Diff to improve parallelization in matrix multiplication in Scalding 0.8.x. It also uses skewJoins as the default, if you want.
From 4d9ff2ef7ae937d856d7a813be8c1f9bb1e36aeb Mon Sep 17 00:00:00 2001
From: Adam Laiacano <[email protected]>
Date: Thu, 12 Dec 2013 15:25:47 -0500
Subject: [PATCH 1/2] add more reducers for matrix join
---
.../scalding/mathematics/MatrixProduct.scala | 35 ++++++++++++----------
1 file changed, 20 insertions(+), 15 deletions(-)
diff --git a/scalding-core/src/main/scala/com/twitter/scalding/mathematics/MatrixProduct.scala b/scalding-core/src/main/scala/com/twitter/scalding/mathematics/MatrixProduct.scala
@alaiacano
alaiacano / english.txt
Last active January 1, 2016 01:08
The most popular first words (roman letters only) from tweets (collected 12/20/2013, which is why "christmas" is on there). Total number of tweets sampled: 1,067,713
1 i 50798
2 i'm 10518
3 the 10164
4 my 9034
5 if 6996
6 you 6995
7 today 6427
8 this 5966
9 when 4911
10 just 4247