Skip to content

Instantly share code, notes, and snippets.

String xmlFile = "Wikipedia-Category-GlobalWarming-20090919043526.xml";
InputStream is = WikipediaSeederTest.class.getClassLoader().getResourceAsStream(xmlFile);
A="jar $JOB_JAR com.attinteractive.ar.hadoopjobs.examplejob.Main $1 $2"
B="mvn exec:exec -Dexec.args="
exec $B"$A"
# instant.rake
# Rake rules for compiling and running trivial Java programs
#
# Usage: rake com.example.MonkeyShines
# Source goes under ./src
# Classes end up under ./target
require 'rake/clean'
libs = FileList["lib/*"]
# i don't have a rails app handy, but I think this works if you have dependt destroy set for the child relationship
post.comments.delete(@some_comment)
package play;
import org.jruby.embed.PathType;
import org.jruby.embed.ScriptingContainer;
public class ClassUseSample {
public ScriptingContainer container;
public String file;
public ClassUseSample(String filename) {
import org.jruby.embed.PathType;
import org.jruby.embed.ScriptingContainer;
public class Embedder {
public ScriptingContainer container;
public String file;
public Embedder(String filename) {
file = filename;
container = new ScriptingContainer();
# Use Jruby to read hadoop sequence files
def load_libs(libs)
Dir.glob(File.join(libs,"*.jar")).each { |f|
require f
}
end
load_libs ENV["HADOOP_HOME"]
load_libs File.join(ENV["HADOOP_HOME"], 'lib')
engine = Moonstone::Engine.new
engine.index(@some_docs)
engine.reader do |r|
r.terms.for_field('name').sort.should == %w{ burger king depeche mode }.sort
end
#!/bin/bash
# backup your delicious bookmarks
curl -k --user `cat password.txt` -o backup.xml -O 'https://api.del.icio.us/v1/posts/all'
Yes sorry.
Basically what we are trying is to constraint the effect of the raw frequency (saturate the frequency).
In Lucene this is carried out with the root square of the frequency, another classical approach
is to use the log. With both approaches we avoid giving a linear 'importance' to the frequency.
BM25 is a bit tricky, it parametrises the 'saturation' of the frequency with a parameter k1, with the
equation weight(t)/(weight(t)+k1). Usually k1 is fixed to 2, but it can be fixed by collection.