Skip to content

Instantly share code, notes, and snippets.

View billdueber's full-sized avatar

Bill Dueber billdueber

View GitHub Profile
@billdueber
billdueber / gist:4596330
Last active December 11, 2015 11:48
File.read vs File.read with size
require 'benchmark'
bigfiles = %w[ddd1 ddd2 ddd3] # all copies of the same 6MB file
puts RUBY_DESCRIPTION
puts "Bigfile size is ", File.size('ddd1')
Benchmark.bmbm do |bm|
bm.report("straight read") { bigfiles.each {|bigfile| File.read(bigfile) } }
bm.report("read w/ size") { bigfiles.each {|bigfile| File.read(bigfile,File.size(bigfile)) } }
end
@billdueber
billdueber / gist:4484735
Last active December 10, 2015 19:58
Packing a bunch of ids into an encrypted string for use in a URL -- just some experiments. Driven by https://bibwild.wordpress.com/2013/01/07/crazy-use-of-encryption-to-protect-refworks-callback-urls/
# Packing a bunch of ids into an encrypted string for use in a URL -- just some experiments.
# Driven by https://bibwild.wordpress.com/2013/01/07/crazy-use-of-encryption-to-protect-refworks-callback-urls/
# A quick experiment to look at how many ids of various types we can crust into a
# encrypted string. Obviously we could be more particular about how we pack them in
# if we know the characteristics of the IDs ahead of time.
require 'stringio'
require 'zlib'
@billdueber
billdueber / gist:3213716
Created July 31, 2012 04:44
JRuby JSON-generation slowdown benchmark
require 'benchmark'
require 'json'
puts RUBY_DESCRIPTION
# This mess is a json representation of a MARC record (format used in libraries and museums)
m = %Q[{"leader":"01470nam^a22004451^^4500","fields":[{"001":"000000040"},{"005":"19880715000000.0"},{"006":"m^^^^^^^^d^^^^^^^^"},{"007":"cr^bn^---auaua"},{"008":"880715s1968^^^^nyuae^^^^b^^^|00100^eng^^"},{"010":{"ind1":" ","ind2":" ","subfields":[{"a":"68027371"}]}},{"035":{"ind1":" ","ind2":" ","subfields":[{"a":"(RLIN)MIUG0001728-B"}]}},{"035":{"ind1":" ","ind2":" ","subfields":[{"a":"(CaOTULAS)159818044"}]}},{"035":{"ind1":" ","ind2":" ","subfields":[{"a":"(OCoLC)ocm00001728"}]}},{"040":{"ind1":" ","ind2":" ","subfields":[{"a":"DLC"},{"c":"DLC"},{"d":"MiU"},{"d":"CStRLIN"},{"d":"MiU"}]}},{"050":{"ind1":"0","ind2":" ","subfields":[{"a":"N6350"},{"b":".P4 1968b"}]}},{"082":{"ind1":" ","ind2":" ","subfields":[{"a":"709.03"}]}},{"100":{"ind1":"1","ind2":" ","subfields":[{"a":"Pevsner, Nikolaus,"},{"d":"1902-1983."}]}},{"245":{"ind1":"1","ind2":"0","subfields
@billdueber
billdueber / decode_bench_results.txt
Created June 1, 2012 16:23
jruby: weird Benchmark results?
jruby 1.6.7.2 (ruby-1.9.2-p312) (2012-05-01 26e08ba) (OpenJDK 64-Bit Server VM 1.7.0-u4-b13) [darwin-amd64-java]
user system total real
4.650000 0.000000 4.650000 ( 4.650000)
jruby 1.7.0.preview1 (ruby-1.9.3-p203) (2012-05-19 00c8c98) (OpenJDK 64-Bit Server VM 1.7.0-u4-b13) [darwin-amd64-java]
user system total real
12.090000 0.280000 12.370000 ( 5.778000)
@billdueber
billdueber / pdf_links.rb
Created March 6, 2012 19:28
pdf list of links
@billdueber
billdueber / gist:1979628
Created March 5, 2012 17:30
marc_marc4j reader producing ruby-marc recofds
Testing on an 18K file in both marc21 and marc-xml. Loop looks like:
reader = MARC::Reader.new(m21file) # or whatever appropriate reader
reader.each do |r|
t = r['245']['a']
end
MARC version is the just-released 0.4.4
The following numbers are for a run with just enough compatibility to run the above code.
@billdueber
billdueber / gist:1947347
Created March 1, 2012 04:46
numericID solr fieldtype
<fieldtype name="numericID" class="solr.TextField"
positionIncrementGap="1000" omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="^.*?(\p{N}[\p{N}\-\.]{6,}[xX]?).*$"
replacement="***$1" />
<filter class="solr.PatternReplaceFilterFactory"
pattern="^[^\*].*$" replacement="" />
<filter class="solr.PatternReplaceFilterFactory"
@billdueber
billdueber / ruby-marc_bench.md
Created January 25, 2012 18:00
Ruby-marc slow on strict parser

I upgraded my ruby 1.8 to the latest patchlevel and all of a sudden ruby-marc was super-slow. I found the same thing on 1.9 and in JRuby, so I investigated.

There's a marc.bytes.to_a call inside the loop in Reader#decode. All the fix does is move it outside the loop so it only happens once.

You can see the patch in the "slowreadfix" branch at https://github.com/ruby-marc/ruby-marc/commit/beba83745ebe0848218496e967edd65d632fb01e

As you can see, the speedup is about a factor of five.

Test case is reading in a Marc21 file with about 18K records in it.

@billdueber
billdueber / extend.rb
Created December 19, 2011 16:51
JRuby: #extend 10x slower in 1.7 w/OpenJDK 1.7?
module A
def a
end
end
class C
end
n = 100_000
@billdueber
billdueber / parser1.rb
Created November 18, 2011 05:26
Simple term parser
require 'parslet'
require 'pp'
class AdvParser < Parslet::Parser
rule(:space) { match['\\s\\t'].repeat(1) } # at least one space/tab
rule(:space?) { space.maybe } # zero or 1 things that match the 'space' rule
rule(:startexpr) { str('(') >> space? } # '(' followed by optional space
rule(:endexpr) { space? >> str(')') }