Skip to content

Instantly share code, notes, and snippets.

View rjurney's full-sized avatar

Russell Jurney rjurney

View GitHub Profile
@rjurney
rjurney / flask_splat.py
Created December 11, 2012 05:26
How do I do this splatization in Flask without being SO FREAKING UGLY?
# Enable /emails and /emails/ to serve the last 20 emaildb in our inbox unless otherwise specified
default_offsets={'offset1': 0, 'offset2': 0 + config.EMAIL_RANGE}
@app.route('/', defaults=default_offsets)
@app.route('/emails', defaults=default_offsets)
@app.route('/emails/', defaults=default_offsets)
@app.route("/emails/<int:offset1>/<int:offset2>")
def list_emaildb(offset1, offset2):
offset1 = int(offset1)
offset2 = int(offset2)
emails = emaildb.find()[offset1:offset2] # Uses a MongoDB cursor
anonymous
anonymous / Example.pig
Created December 24, 2012 07:19
Mapping the XML DOM to a Pig schema
characters = load 'example.pig' using XMLLoader('character');
describe characters
{properties:map[], name:chararray, born:datetime, qualification:chararray}
@rjurney
rjurney / example.pig
Created December 24, 2012 07:20 — forked from anonymous/Example.pig
I want to extend Pig's existing XMLLoader to go beyond capturing the text inside a tag and to actually create a Pig mapping of the Document Object Model the XML represents. This would be similar to elephant-bird's JsonLoader. Semi-structured data can vary, so this behavior can be risky but... I want people to be able to load JSON and XML data ea…
characters = load 'example.xml' using XMLLoader('character');
describe characters
{properties:map[], name:chararray, born:datetime, qualification:chararray}
@rjurney
rjurney / classify.pig
Created April 10, 2013 21:25
Biggest ILLUSTRATE I've ever had work :)
register /me/Software/elephant-bird/pig/target/elephant-bird-pig-3.0.6-SNAPSHOT.jar
register /me/Software/pig/build/ivy/lib/Pig/json-simple-1.1.jar
set elephantbird.jsonloader.nestedLoad 'true'
/* Remove files from previous runs */
rmf /tmp/prior_words.txt
rmf /tmp/prior_genres.txt
rmf /tmp/p_word_given_genre.txt
rmf /tmp/p_genre_given_word.txt
@rjurney
rjurney / convert_to_scikit.py
Last active December 16, 2015 14:20
How to convert a pile of pig training data into the format scikit expects :)
import sys, os
import numpy as np
from collections import defaultdict
from operator import itemgetter
from sklearn.naive_bayes import GaussianNB
# live 1 classic pop and rock
# onli 2 classic pop and rock
# tri 1 classic pop and rock
# keep 3 classic pop and rock
@rjurney
rjurney / gist:5570095
Last active December 17, 2015 07:08
A github ForkEvent as a Pig map via elephant-bird and ILLUSTRATE
{
id=1509192740,
repo=[
id#3094885,
name#oetiker/jquery.EmbedPicasaGallery,
url#https://api.github.dev/repos/oetiker/jquery.EmbedPicasaGallery
],
created_at=2012-01-04T20:17:45Z,
payload=[
forkee#
sudo apt-get update
sudo apt-get install build-essential chrpath libssl-dev libxft-dev -y
sudo apt-get install libfreetype6 libfreetype6-dev -y
sudo apt-get install libfontconfig1 libfontconfig1-dev -y
cd ~
export PHANTOM_JS="phantomjs-2.1.1-linux-x86_64"
wget https://github.com/Medium/phantomjs/releases/download/v2.1.1/$PHANTOM_JS.tar.bz2
sudo tar xvjf $PHANTOM_JS.tar.bz2
sudo mv $PHANTOM_JS /usr/local/share
sudo ln -sf /usr/local/share/$PHANTOM_JS/bin/phantomjs /usr/local/bin
@hyfen
hyfen / .bashrc
Created June 12, 2017 01:08
Save unlimited bash history in OSX
# save history to ~/.bash_history as soon as command is run
export PROMPT_COMMAND='history -a'
# save unlimited history
# osx doesn't seem to respect =-1 or = options
export HISTSIZE=9999999999
export HISTFILESIZE=999999999
# osx doesn't actually respect this and it'll fall back to unix timestamp (which we want)
export HISTTIMEFORMAT="%d/%m/%y %T "
@ntrepid8
ntrepid8 / cron_speedtest.sh
Created June 28, 2017 02:18
Script to run speedtest-cli via cron and log the results
#!/usr/bin/env bash
LOG_PATH="/home/$(whoami)/log/speedtest.log"
if result=$(/usr/bin/speedtest --simple); then
parsed_result=$(printf "${result}\"" | sed ':a;N;$!ba;s/\n/" /g' | sed 's/: /="/g')
printf "[$(date)] ${parsed_result}\n" >> "${LOG_PATH}"
else
printf "[$(date)] error\n" >> "${LOG_PATH}"
exit 1