-
-
Save Overbryd/1710233 to your computer and use it in GitHub Desktop.
# coding: utf-8 | |
require 'rubygems' | |
# require fileutils to use FileUtils. Otherwise an error gets raised. | |
# uninitialized constant Picky::Backends::Helpers::File::FileUtils (NameError) | |
# from /Volumes/Home/Projects/utf8/picky/server/lib/picky/backends/prepared/text.rb:66:in `open' | |
# | |
require 'fileutils' | |
require 'picky' | |
include Picky | |
Utf8Symbol = Struct.new(:id, :symbol, :bytecode, :html_entity, :description) | |
symbols = [ | |
Utf8Symbol.new(1, 'π', 'f09f9092', '🐒', 'MONKEY'), | |
Utf8Symbol.new(2, 'π΅', 'f09f90b5', '🐵', 'MONKEY FACE'), | |
Utf8Symbol.new(3, 'π', 'f09f9988', '🙈', 'SEE-NO-EVIL MONKEY'), | |
Utf8Symbol.new(4, 'π', 'f09f9989', '🙉', 'HEAR-NO-EVIL MONKEY'), | |
Utf8Symbol.new(5, 'π', 'f09f998a', '🙊', 'SPEAK-NO-EVIL MONKEY') | |
] | |
monkey_index = Index.new(:monkey_index) do | |
source { symbols } | |
indexing removes_characters: /[^a-z0-9\-\s\"\~\*\:\,]/i, | |
splits_text_on: /[\s\-]/ | |
category :description | |
end | |
monkey_search = Search.new(monkey_index) do | |
searching removes_characters: /[^a-z0-9\-\s\/\_\&\.\"\~\*\:\,]/i, # Picky needs control chars *"~:, to pass through. | |
stopwords: /\b(and|the|of|it|in|for)\b/i, | |
splits_text_on: /[\s\/\-\&]+/ | |
end | |
monkey_index.load rescue monkey_index.index | |
p evil_monkeys = monkey_search.search("evil").ids |
Thanks for your help.
I'm really trying to get it working, but I fail even with this script.
No matter what ruby I am using, the result is always an empty array. I think something else must be wrong with my script. I'll sleep over it.
$ rm -rf index/; ruby picky-tryout.rb
.Picky is indexing using a single process: TD Done in 0s.
[]
$ rm -rf index/; macruby picky-tryout.rb
.Picky is indexing using a single process: TD Done in 0s.
[]
:S
I'll run it in the morning and will write you if I am successful. Cheers!
Hi Lukas,
I just saw what it is. I was too focused on the Picky code when in fact it was in the Ruby part. A Struct takes a list of params, not a hash of params. So it should be
Utf8Symbol.new(1, '', 'f09f9092', '🐒', 'MONKEY'),
Utf8Symbol.new(2, '', 'f09f90b5', '🐵', 'MONKEY FACE'),
Utf8Symbol.new(3, '', 'f09f9988', '🙈', 'SEE-NO-EVIL MONKEY'),
Utf8Symbol.new(4, '', 'f09f9989', '🙉', 'HEAR-NO-EVIL MONKEY'),
Utf8Symbol.new(5, '', 'f09f998a', '🙊', 'SPEAK-NO-EVIL MONKEY')
(The gist comment had problems with the special characters, so I removed them)
What happens now is that a hash is passed to the @id
instance variable, the rest of the attributes being nil
.
You should also let the hypen through in the removes_characters, so that it can split on it, if I am not mistaken.
Cheers :)
Haha, shame on me... I've overseen the problem with the Struct initialization too. I'm more used to OpenStruct these days...
And indeed removes_characters:
must keep the hyphen. So I changed those on indexing and searching.
Anyways, it works now! Thank you so much for your time you spend helping me out with this.
$ rm -rf index/; macruby picky-tryout.rb
.Picky is indexing using a single process: TD Done in 0s.
[3, 4, 5]
So that's where those evil monkeys are...!
Now worries, glad to help :)
Indexing by default splits only on \s, so SEE-NO-EVIL is indexed as is. So add option splits_text_on: /\s-/ to index EVIL as a word to have it be found.
I hope that helps :)
Nice script, btw :)