Skip to content

Instantly share code, notes, and snippets.

View erochest's full-sized avatar

Eric Rochester erochest

View GitHub Profile
@erochest
erochest / change-geoserver-host.sql
Created December 19, 2012 20:55
More unholy SQL tricks. This time with Postgres.
CREATE EXTENSION xml2;
-- Credit where credit's due. Or blame:
-- http://dba.publicexchange.com/a/8188
create or replace function bytea_import(p_path text, p_result out bytea)
language plpgsql as $$
declare
l_oid oid;
r record;
begin
(function(window, $, undefined) {
var $window = $(window);
/**
* Show or hide the button depending on the scroll position.
*/
function animateButton() {
var button = $('#back-to-top');
var scrollPosition = $window.scrollTop();
if (scrollPosition > 400) {
@erochest
erochest / start-rserve.sh
Created February 17, 2013 23:12
A simple shell script to start Rserve.
#!/bin/sh
R -e 'library(Rserve)' -e 'Rserve(args="--vanilla")'
@erochest
erochest / lein.bat
Created February 19, 2013 16:32
A current Leiningen batch file script for Windows that downloads Leiningen 2.0.0.
@echo off
setLocal EnableExtensions EnableDelayedExpansion
set LEIN_VERSION=2.0.0
if "%LEIN_VERSION:~-9%" == "-SNAPSHOT" (
set SNAPSHOT=YES
) else (
set SNAPSHOT=NO
@erochest
erochest / base.pp
Last active December 17, 2015 14:08
Some puppet files for setting up my personal config under a Vagrant-managed VM.
# A palate cleanser for apt.
exec { 'apt-get update':
path => ['/usr/bin'],
}
## These two use this module: https://github.com/erochest/puppet-omeka
## Use this to automate getting that set up: https://github.com/erochest/omeka-vm
class { 'omeka':
@erochest
erochest / gist:5853420
Last active December 18, 2015 22:19
Ruby snippets
[1, 1, 2, 3, 5].each { |x| x * 2 }
<address class="vcard" vocab="http://www.w3.org/2006/vcard/ns#" resource="http://scholarslab.org/" typeof="Organization">
<span class="org fn">
<a class="url organization-name" href="http://scholarslab.org/">
<span property="formattedName">Scholars’ Lab</span>
</a>
<a class="organization-unit extended-address" href="http://lib.virginia.edu/" property="hasOrganizationName" resource="http://lib.virginia.edu/" typeof="Organization">
<span property="formattedName">University of Virginia Library</span>
</a>
</span>
<span property="hasAddress" typeof="Work">

Notes

This contains the files I used to perform the timings, as well as the timings themselves.

The timings are to process one bag with 60,000 small files and one bag with one large (10GB) file. Scripts related to the bag with many files are named like *-lots, and scripts related to the bag with one large file are named like *-large.

What I'm Timing

Ruby

#!/usr/bin/env python
import codecs
import os
import lxml.etree as ET
## CHANGE THIS:
@erochest
erochest / xml_to_corpus.py
Last active December 25, 2015 02:29
Pull the text from Perseus TEI.
#!/usr/bin/env python
import codecs
import os
import lxml.etree as ET
## CHANGE THIS: