alaiacano’s gists

alaiacano / animate.py

Created January 13, 2013 16:18

	#!/usr/bin/python
	"""
	python resize_files.py folder_of_input_images output_file.gif
	"""
	import os, sys, re

	RESIZE_PERCENT = 12
	QUALITY = 40

	folder = sys.argv[1]

alaiacano / tumblr.html

Created February 19, 2013 18:41

A mostly-complete Tumblr post template. The input is a post JSON object from the Tumblr API. I think the Audio type is missing still.

	<style>
	.quote {
	border: 1px solid #999;
	page-break-inside: avoid;
	padding: 4px;
	background-color: #333333;
	color: #eee;
	}
	img {
	display: block;

alaiacano / build_report.sh

Last active December 14, 2015 23:09

An example data workflow shell script.

	# get rid of old results
	rm -f *.csv

	# run a hive query to get some action counts for each user.
	hive -e "SELECT user, count(*) as actions FROM post_table WHERE dt=${YESTERDAY_DATE} GROUP BY user" > archive_data.csv

	# grab today's data from scribe server
	scp scribe:/var/log/scribe/post_table/*_current raw_data_${TODAY_DATE}.csv

	# parse today's data into the same format (groupby/count) and append it to the archive

alaiacano / mi.py

Created April 1, 2013 12:26

	import numpy as np


	def entropy(x):
	return -1. * sum([p * np.log2(p) for p in x if p > 0])


	def conditional_entropy(x, axis=0):
	if not axis in [0, 1]:
	raise Exception("Axis must be 0 or 1")

alaiacano / diversity.scala

Last active December 16, 2015 02:59

scalding function to score the diversity of every county in USA for 2011. See https://github.com/krishnanraman/bigdata for context.

	/*
	This method is a member of the PopulationStats class at
	https://github.com/krishnanraman/bigdata/blob/master/PopulationStats.scala
	*/
	def diversity(people: Pipe, fipspipe: Pipe) {

	// we'll need the natural log of 2 in order to get log base 2 later on
	val log2 = scala.math.log(2)

	val attrPipe = people

alaiacano / stat.py

Created August 15, 2013 20:38

	# input data:

	# a,1
	# b,2
	# a,3
	# a,4
	# a,7
	# b,4
	# b,0

alaiacano / toeplitz.jl

Last active December 25, 2015 06:08

	x = [1 2 3]

	# julia> toeplitz(x)
	# 3x3 Array{Int64,2}:
	# 1 2 3
	# 2 1 2
	# 3 2 1
	function toeplitz{T}(x::Array{T})
	n = length(x)
	A = zeros(T, n, n)

alaiacano / bad_hive_example.py

Last active December 26, 2015 04:39

	def average_active_per_day():
	last_month = (datetime.date.today() - datetime.timedelta(30)).strftime("%Y-%m-%d")
	hive_query = """
	SELECT day,
	AVERAGE(unique_users)
	FROM (SELECT day,
	Distinct(user_id) AS unique_users
	FROM activity_log
	WHERE dt >= %s
	GROUP BY day) x

alaiacano / matrixproduct.diff

Last active December 31, 2015 03:59

Diff to improve parallelization in matrix multiplication in Scalding 0.8.x. It also uses skewJoins as the default, if you want.

	From 4d9ff2ef7ae937d856d7a813be8c1f9bb1e36aeb Mon Sep 17 00:00:00 2001
	From: Adam Laiacano <[email protected]>
	Date: Thu, 12 Dec 2013 15:25:47 -0500
	Subject: [PATCH 1/2] add more reducers for matrix join

	---
	.../scalding/mathematics/MatrixProduct.scala \| 35 ++++++++++++----------
	1 file changed, 20 insertions(+), 15 deletions(-)

	diff --git a/scalding-core/src/main/scala/com/twitter/scalding/mathematics/MatrixProduct.scala b/scalding-core/src/main/scala/com/twitter/scalding/mathematics/MatrixProduct.scala

alaiacano / english.txt

Last active January 1, 2016 01:08

The most popular first words (roman letters only) from tweets (collected 12/20/2013, which is why "christmas" is on there). Total number of tweets sampled: 1,067,713

	1 i 50798
	2 i'm 10518
	3 the 10164
	4 my 9034
	5 if 6996
	6 you 6995
	7 today 6427
	8 this 5966
	9 when 4911
	10 just 4247

Adam Laiacano alaiacano