Skip to content

Instantly share code, notes, and snippets.

View erochest's full-sized avatar

Eric Rochester erochest

View GitHub Profile
@erochest
erochest / update-branches
Last active August 29, 2015 13:56
Transfer all the branches to another repo.
#!/bin/bash
SOURCE=origin
DEST=bitbucket
for b in $(git branch -r | grep $SOURCE | grep -v -- '->' | sed 's/origin\///' | tr -d ' '); do
echo $b
git checkout $b
git pull
git push $DEST $b
#!/usr/bin/env python
"""Run a command and generate an alert when it's done."""
import argparse
import subprocess
@erochest
erochest / get_data.py
Last active December 28, 2015 17:59
A script for scraping enrollment race data from the MD Report Card site.
#!/usr/bin/env python
# Dependencies:
# - pip install lxml
# - pip install cssselect
# - pip install requests
import collections
import csv
@erochest
erochest / cabal-brew
Created November 2, 2013 02:12
Using `cabal-install` to generate installable package for [brew](http://brew.sh/).
#!/bin/bash
USAGE=<<EOF
cabal-brew PACKAGE VERSION
EOF
CABAL=cabal
package="$1"
keg="cabal-$package"
version="$2"
@erochest
erochest / Vagrantfile
Created October 21, 2013 20:59
Bare-bones vagrant files for using Ubuntu 13.10 (Saucy Server) as a basebox.
# vi: set ft=ruby :
VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "ubuntu-13.10"
config.vm.box_url = "http://cloud-images.ubuntu.com/vagrant/saucy/current/saucy-server-cloudimg-amd64-vagrant-disk1.box"
end
# -*- mode: ruby -*-
# vi: set ft=ruby :
# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "ubuntu-13.10"
config.vm.box_url = "http://cloud-images.ubuntu.com/vagrant/saucy/current/saucy-server-cloudimg-amd64-vagrant-disk1.box"
end
@erochest
erochest / xml_to_corpus.py
Last active December 25, 2015 02:29
Pull the text from Perseus TEI.
#!/usr/bin/env python
import codecs
import os
import lxml.etree as ET
## CHANGE THIS:
#!/usr/bin/env python
import codecs
import os
import lxml.etree as ET
## CHANGE THIS:

Notes

This contains the files I used to perform the timings, as well as the timings themselves.

The timings are to process one bag with 60,000 small files and one bag with one large (10GB) file. Scripts related to the bag with many files are named like *-lots, and scripts related to the bag with one large file are named like *-large.

What I'm Timing

Ruby

<address class="vcard" vocab="http://www.w3.org/2006/vcard/ns#" resource="http://scholarslab.org/" typeof="Organization">
<span class="org fn">
<a class="url organization-name" href="http://scholarslab.org/">
<span property="formattedName">Scholars’ Lab</span>
</a>
<a class="organization-unit extended-address" href="http://lib.virginia.edu/" property="hasOrganizationName" resource="http://lib.virginia.edu/" typeof="Organization">
<span property="formattedName">University of Virginia Library</span>
</a>
</span>
<span property="hasAddress" typeof="Work">