Skip to content

Instantly share code, notes, and snippets.

View lbjay's full-sized avatar

Jay Luker lbjay

View GitHub Profile
@lbjay
lbjay / gist:1365195
Created November 14, 2011 21:16
montysolr compile errors
compile:
[javac] /home/jluker/workspace/montysolr/build.xml:249: warning: 'includeantruntime' was not set, defaulting to bu
ild.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 72 source files to /home/jluker/workspace/montysolr/bin
[javac] /home/jluker/workspace/montysolr/src/java/org/apache/lucene/queryParser/aqp/processors/AqpQProcessor.java:
14: warning: sun.reflect.generics.reflectiveObjects.NotImplementedException is Sun proprietary API and may be removed
in a future release
[javac] import sun.reflect.generics.reflectiveObjects.NotImplementedException;
[javac] ^
[javac] /home/jluker/workspace/montysolr/src/java/org/apache/solr/search/InvenioQParserPlugin.java:15: cannot find
@lbjay
lbjay / gist:1368067
Created November 15, 2011 19:30
node wikipulse.js error
[email protected]:~/projects/wikipulse> 14:29:46 5059 $ node wikipulse.js
node.js:201
throw e; // process.nextTick error, or 'error' event on first tick
^
Error: require.paths is removed. Use node_modules folders, or the NODE_PATH environment variable instead.
at Function.<anonymous> (module.js:376:11)
at Object.<anonymous> (/home/lbjay/projects/wikipulse/node_modules/irc-js/lib/irc.js:15:8)
at Module._compile (module.js:432:26)
at Object..js (module.js:450:10)
public static void regexQuery(String pattern) throws Exception {
Directory dir = FSDirectory.open(new File(SOLR_HOME + "/fulltext-build/data/index/"));
IndexReader reader = IndexReader.open(dir, true);
IndexSearcher searcher = new IndexSearcher(reader);
Term t = new Term("body", pattern);
Query q = new RegexQuery(t);
System.out.println("query object: " + q);
System.out.println("stats: " + reader.numDocs());
TopDocs docs = searcher.search(q, 10);
FastVectorHighlighter fvh = new FastVectorHighlighter(false, true);
pdf-extract mark --headers --footers --bodies 2259.pdf --trace
/home/jluker/.rvm/gems/ruby-1.9.2-p290/gems/pdf-extract-0.0.9/lib/pdf.rb:169:in `invoke_calls': undefined method `pages' for #<PDF::Reader:0x000000027cd880> (NoMethodError)
from /home/jluker/.rvm/gems/ruby-1.9.2-p290/gems/pdf-extract-0.0.9/lib/pdf-extract.rb:43:in `block in parse'
from /home/jluker/.rvm/gems/ruby-1.9.2-p290/gems/pdf-extract-0.0.9/lib/pdf-extract.rb:39:in `each'
from /home/jluker/.rvm/gems/ruby-1.9.2-p290/gems/pdf-extract-0.0.9/lib/pdf-extract.rb:39:in `parse'
from /home/jluker/.rvm/gems/ruby-1.9.2-p290/gems/pdf-extract-0.0.9/lib/pdf-extract.rb:54:in `view'
from /home/jluker/.rvm/gems/ruby-1.9.2-p290/gems/pdf-extract-0.0.9/bin/pdf-extract:116:in `block (4 levels) in <top (required)>'
from /home/jluker/.rvm/gems/ruby-1.9.2-p290/gems/pdf-extract-0.0.9/bin/pdf-extract:113:in `each'
from /home/jluker/.rvm/gems/ruby-1.9.2-p290/gems/pdf-extract-0.0.9/bin/pdf-extract:113:in `block (3 levels) in <top (required)>'
from /home/jluker/
pdf-extract extract --sections 2259.pdf --trace
/home/jluker/.rvm/gems/ruby-1.9.2-p290/gems/pdf-reader-1.0.0.rc1/lib/pdf/reader/object_hash.rb:73:in `[]': undefined method `to_i' for {:BaseFont=>:"Times-Roman", :Type=>:Font, :Subtype=>:Type1}:Hash (NoMethodError)
from /home/jluker/.rvm/gems/ruby-1.9.2-p290/gems/pdf-extract-0.0.9/lib/model/characters.rb:132:in `block in build_fonts'
from /home/jluker/.rvm/gems/ruby-1.9.2-p290/gems/pdf-extract-0.0.9/lib/model/characters.rb:131:in `each'
from /home/jluker/.rvm/gems/ruby-1.9.2-p290/gems/pdf-extract-0.0.9/lib/model/characters.rb:131:in `build_fonts'
from /home/jluker/.rvm/gems/ruby-1.9.2-p290/gems/pdf-extract-0.0.9/lib/model/characters.rb:163:in `block (2 levels) in include_in'
from /home/jluker/.rvm/gems/ruby-1.9.2-p290/gems/pdf-extract-0.0.9/lib/pdf.rb:81:in `call'
from /home/jluker/.rvm/gems/ruby-1.9.2-p290/gems/pdf-extract-0.0.9/lib/pdf.rb:81:in `block (2 levels) in expand_listeners_to_callback_methods'
from /home/jluker/.rvm/gems/ruby-1.9.2-p290/gems/p
@lbjay
lbjay / gist:1535822
Created December 29, 2011 19:36
pdf-extract extract --sections
<?xml version="1.0"?>
<pdf>
<section line_height="9.96" font="KAAPYP+CMR10" letter_ratio="0.02" year_ratio="0.0" cap_ratio="0.0" name_ratio="0.2767857142857143"
word_count="112" lateness="0.06666666666666667" reference_score="4.6">
<line x_offset="0.0" y_offset="117.01" spacing="0.5">Abstract.</line>
<line x_offset="57.0" y_offset="116.52" spacing="-9.47">Theory is presented for the distributions of local process intensity
and</line>
<line x_offset="0.0" y_offset="103.56" spacing="3.0">local average pore dimensions in random fibrous materials. For complete
partitioning</line>
<line x_offset="0.0" y_offset="90.6" spacing="3.0">of the network into contiguous square zones, the variance of local process
junit-sequential:
[junit] WARNING: multiple versions of ant detected in path for junit
[junit] jar:file:/usr/share/java/ant.jar!/org/apache/tools/ant/Project.class
[junit] and jar:file:/usr/share/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit] Testsuite: invenio.montysolr.TestMontySolrBasicOperations
[junit] Warning: we add the default folder to sys.path:
[junit] /home/jluker/workspace/montysolr/build/dist
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.6 sec
[junit]
[junit] ERROR:root:Unknown target; message_id=*:unknown_call
@lbjay
lbjay / gist:2007670
Created March 9, 2012 17:31
python import slope
import os
import sys
import time
import httplib2
import threading
from Queue import Queue
import robotparser as rp
from optparse import OptionParser
from datetime import datetime, timedelta
@lbjay
lbjay / BibstemTransformer
Created August 16, 2012 19:26
Bibstem Transformer
package org.apache.solr.handler.dataimport;
import java.util.List;
import java.util.Map;
import java.util.regex.Pattern;
public class BibstemTransformer extends Transformer {
private static Pattern fourDigit = Pattern.compile("^\\d{4}.+");
private static Pattern lastFour = Pattern.compile("^[\\.\\d]+$");
[email protected]:/proj/adsx/fulltext-crawler> 10:04:52 5067 (master) $ curl -H "If-Modified-Since: Mon, 14 May 2012 13:56:18 GMT" -H "user-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:14.0) Gecko/20100101 Firefox/14.0.1 (NASA/ADS crawler; [email protected])" -I http://www.agu.org/journals/wr/wr1205/2011WR011460/2011WR011460.xml
HTTP/1.1 304 Not Modified
Date: Tue, 11 Sep 2012 14:08:55 GMT
Server: Apache/1.3.34 Ben-SSL/1.57 (Unix) PHP/5.2.8 mod_jk/1.2.1
ETag: "85dd93-1c21e-4fa7d482"
[email protected]:/proj/adsx/fulltext-crawler> 10:06:29 5068 (master) $ curl -H "If-Modified-Since: Mon, 14 May 2012 13:56:18 GMT" -H "user-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:14.0) Gecko/20100101 Firefox/14.0.1 (NASA/ADS crawler; [email protected])" -I http://www.agu.org/journals/wr/wr1205/2011WR011460/2011WR011460.xml
HTTP/1.1 200 OK
Date: Tue, 11 Sep 2012 14:09:09 GMT
Server: Apache/1.3.34 Ben-SSL/1.57 (Unix) PHP/5.2.8 mod_jk/1.2.1