Skip to content

Instantly share code, notes, and snippets.

@edbond
Created April 7, 2009 13:30
Show Gist options
  • Select an option

  • Save edbond/91234 to your computer and use it in GitHub Desktop.

Select an option

Save edbond/91234 to your computer and use it in GitHub Desktop.
#!/usr/bin/ruby
require 'rubygems'
require 'activesupport'
require 'couchrest'
$KCODE='u'
require 'jcode'
NGRAM_SIZE=3
@db = CouchRest.database("http://localhost:5984/geo_#{NGRAM_SIZE}")
terms=Hash.new(0.0)
word = (ARGV[0] || 'кришатик').mb_chars
puts word.to_s.inspect
(0..word.size-3).each do |i|
s=word.slice(i,3)
rows = @db.view('z/trgm', :key => s)['rows']
next if rows.empty?
rows.each do |r|
k=r["value"]
terms[k]+=1
end
end
# normalize by length
terms.keys.each do |k|
d = k.mb_chars.size-word.size
d *= -1 if d<0
next if d.zero?
terms[k] /= d.to_f
end
puts terms.sort{|a,b| (a[1]<=>b[1])}[-4..-1].inspect
puts terms.sort{|a,b| a[1] <=> b[1]}.last.inspect
// output
// "abc" -> ["cabc", "abck", "jabcr"]
// map
function(doc) {
var l=doc.title.length;
for(var i=0; i<(l-2); i++) {
var s=doc.title.substr(i,3);
emit(s, doc.title);
};
}
// reduce?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment