This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<fieldtype name="numericID" class="solr.TextField" | |
positionIncrementGap="1000" omitNorms="true"> | |
<analyzer> | |
<tokenizer class="solr.KeywordTokenizerFactory"/> | |
<filter class="solr.PatternReplaceFilterFactory" | |
pattern="^.*?(\p{N}[\p{N}\-\.]{6,}[xX]?).*$" | |
replacement="***$1" /> | |
<filter class="solr.PatternReplaceFilterFactory" | |
pattern="^[^\*].*$" replacement="" /> | |
<filter class="solr.PatternReplaceFilterFactory" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Testing on an 18K file in both marc21 and marc-xml. Loop looks like: | |
reader = MARC::Reader.new(m21file) # or whatever appropriate reader | |
reader.each do |r| | |
t = r['245']['a'] | |
end | |
MARC version is the just-released 0.4.4 | |
The following numbers are for a run with just enough compatibility to run the above code. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
load '/path/to/itextpdf-5.2.0.jar' | |
IText = Java::com.itextpdf.text # same some typing | |
doc = IText::Document.new(IText::PageSize::LETTER, 50, 50, 50, 50) | |
# AddAuthor and addSubject seem to not work, at least for viewing in Preview | |
doc.addAuthor "Bill Dueber" | |
doc.addSubject "Why are we requiring PDF???" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jruby 1.6.7.2 (ruby-1.9.2-p312) (2012-05-01 26e08ba) (OpenJDK 64-Bit Server VM 1.7.0-u4-b13) [darwin-amd64-java] | |
user system total real | |
4.650000 0.000000 4.650000 ( 4.650000) | |
jruby 1.7.0.preview1 (ruby-1.9.3-p203) (2012-05-19 00c8c98) (OpenJDK 64-Bit Server VM 1.7.0-u4-b13) [darwin-amd64-java] | |
user system total real | |
12.090000 0.280000 12.370000 ( 5.778000) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'benchmark' | |
require 'json' | |
puts RUBY_DESCRIPTION | |
# This mess is a json representation of a MARC record (format used in libraries and museums) | |
m = %Q[{"leader":"01470nam^a22004451^^4500","fields":[{"001":"000000040"},{"005":"19880715000000.0"},{"006":"m^^^^^^^^d^^^^^^^^"},{"007":"cr^bn^---auaua"},{"008":"880715s1968^^^^nyuae^^^^b^^^|00100^eng^^"},{"010":{"ind1":" ","ind2":" ","subfields":[{"a":"68027371"}]}},{"035":{"ind1":" ","ind2":" ","subfields":[{"a":"(RLIN)MIUG0001728-B"}]}},{"035":{"ind1":" ","ind2":" ","subfields":[{"a":"(CaOTULAS)159818044"}]}},{"035":{"ind1":" ","ind2":" ","subfields":[{"a":"(OCoLC)ocm00001728"}]}},{"040":{"ind1":" ","ind2":" ","subfields":[{"a":"DLC"},{"c":"DLC"},{"d":"MiU"},{"d":"CStRLIN"},{"d":"MiU"}]}},{"050":{"ind1":"0","ind2":" ","subfields":[{"a":"N6350"},{"b":".P4 1968b"}]}},{"082":{"ind1":" ","ind2":" ","subfields":[{"a":"709.03"}]}},{"100":{"ind1":"1","ind2":" ","subfields":[{"a":"Pevsner, Nikolaus,"},{"d":"1902-1983."}]}},{"245":{"ind1":"1","ind2":"0","subfields |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Packing a bunch of ids into an encrypted string for use in a URL -- just some experiments. | |
# Driven by https://bibwild.wordpress.com/2013/01/07/crazy-use-of-encryption-to-protect-refworks-callback-urls/ | |
# A quick experiment to look at how many ids of various types we can crust into a | |
# encrypted string. Obviously we could be more particular about how we pack them in | |
# if we know the characteristics of the IDs ahead of time. | |
require 'stringio' | |
require 'zlib' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'benchmark' | |
bigfiles = %w[ddd1 ddd2 ddd3] # all copies of the same 6MB file | |
puts RUBY_DESCRIPTION | |
puts "Bigfile size is ", File.size('ddd1') | |
Benchmark.bmbm do |bm| | |
bm.report("straight read") { bigfiles.each {|bigfile| File.read(bigfile) } } | |
bm.report("read w/ size") { bigfiles.each {|bigfile| File.read(bigfile,File.size(bigfile)) } } | |
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!-- | |
######################### | |
TEXT FIELD TYPES | |
######################### | |
In all cases, we want to perform NFKC unicode normalization, | |
case folding, and ASCII-folding (i.e., removal of accents so | |
ü => u). | |
ICUFoldingFilterFactory will give us *all* of those things. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<fieldtype name="text" class="solr.TextField" positionIncrementGap="1000"> | |
<analyzer type="index"> | |
<charFilter class="solr.PatternReplaceCharFilterFactory" | |
pattern="&" replacement=" and " /> | |
<charFilter class="solr.PatternReplaceCharFilterFactory" | |
pattern="\b([A-Ga-g])[\#♯](\s+|\Z)" replacement="$1 sharp$2" /> | |
<charFilter class="solr.PatternReplaceCharFilterFactory" | |
pattern="\b([A-Ga-g])\s*[b♭](\s+|\Z)" replacement="$1 flat$2" /> | |
<charFilter class="solr.PatternReplaceCharFilterFactory" | |
pattern="\b[Cc]\+\+" replacement="cplusplus" /> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'marc' | |
require 'marc4j4r' | |
require 'benchmark' | |
iterations = 1 | |
xmlsourcefile = 'topics.xml' # 18k records as a MARC-XML collection | |
puts RUBY_DESCRIPTION |