Skip to content

Instantly share code, notes, and snippets.

View billdueber's full-sized avatar

Bill Dueber billdueber

View GitHub Profile
@billdueber
billdueber / gist:1154163
Created August 18, 2011 14:29
OSX command-line args
1. Go to the app directory
cd /Applications/Google\ Chrome.app/Contents/MacOS/
2. Rename the app to app.orig
mv Google\ Chrome Google\ Chrome.orig
3. Create a shell script with the original name that uses the args you want
@billdueber
billdueber / breakjump.rb
Created June 24, 2011 18:57
Log and inability to catch breakjump under jruby 1.6.2
require 'java'
def showBreakProblem &blk
threads = 2
consumers = []
threads.times do |i|
consumers << Thread.new(i) do |i|
Thread.current[:num] = i
begin
@billdueber
billdueber / ruby-marc-unicode.rb
Created May 31, 2011 20:39
Simple program to show 1.8 vs 1.9 rubymarc issues with unicode
require 'rubygems'
require 'marc'
require 'open-uri'
r = MARC::Reader.new(open('http://mirlyn.lib.umich.edu/Record/000039829.marc')).first
puts r
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"explainOther":"",
"fl":"*,score",
"indent":"on",
"start":"0",
@billdueber
billdueber / sitemapindex.pl
Created February 4, 2011 16:02
Create a set of simple sitemap files for google to crawl
# Then just create a simple XML file pointing to the 50k line files
# Don't forget to gzip the files first
#!/usr/local/bin/perl
my $numfiles = ARGV[0]; # number of files generated before
my $urlToSitemapDir = 'http://www.my.machine.edu/dir/for/sitemaps';
print q{<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
@billdueber
billdueber / Hathi catalog.txt
Created December 2, 2010 14:52
All 035 types with more than 1000 records in UMich/HathiTrust
1006 DLI
1040 GyWOH
1059 SciDir
1062 NYU
1077 ItFiC
1150 CSt
1175 FrPJT
1176 DGPO
1182 NjP
1196 BLKDR
@billdueber
billdueber / suss_example.rb
Created November 10, 2010 18:05
Example of how to use jruby_streaming_update_solr_server
require 'rubygems'
require 'threach'
require 'jruby_streaming_update_solr_server'
solrURL = 'your solr url'
sussQueueSize = 128 # number of docs to queue up
sussThreads = 1 # number of threads to use to send stuff to solr
threads = 3 # number of threads to use to process the data
@billdueber
billdueber / marc_deserialization_bench.rb
Created October 25, 2010 20:26
Deserialization speed of marc-in-json vs marcxml under ruby-marc with fastest available libraries
require 'rubygems'
require 'marc'
require 'yajl'
require 'benchmark'
iterations = 5
xmlsourcefile = 'topics.xml' # 18k records as a MARC-XML collection
jsonsourcefile = 'topics.ndj' # Same records as newline-delimited marc-in-json
@billdueber
billdueber / autoload.rb
Created October 25, 2010 17:41
Problem with autoload and threading
require 'rubygems'
require 'rdf'
require 'threach'
(1..10).threach(3) do |c|
u = RDF::URI.new("http://example.org/#{c}/"); puts u.to_s
end
# Code to benchmark various serializations of MARC records using ruby-marc
# Not included is XML -- serialization using ruby-marc is ridiculously slow and the # filesizes are bigger than anything else. Even with the lib-xml reader,
# deserialization is also relatively slow
#
# I didn't bother to benchmark json/pure in later runs because it's just so damn
# slow that it would never be a good choice.
#
# My results can be found at http://robotlibrarian.billdueber.com/sizespeed-of-various-marc-serializations-using-ruby-marc/
require 'marc'