Skip to content

Instantly share code, notes, and snippets.

@sids
Created September 25, 2009 04:53
Show Gist options
  • Select an option

  • Save sids/193318 to your computer and use it in GitHub Desktop.

Select an option

Save sids/193318 to your computer and use it in GitHub Desktop.
require 'whatever/classifier'
classifier = Classifier.new
# training:
classifier.add_document(:text => "blah", :class => :a)
classifier.add_document(:text => "bleh", :class => :b)
# get model; training is automatically finalised
model = classifier.get_model
# do classification
puts model.classify(:text => "blah") # OUTPUT: a
require 'whatever/classifier'
require 'whatever/tokenizers/standard'
require 'whatever/preprocessors/stop_word_remover'
require 'whatever/preprocessors/kstemmer'
classifier = Classifier.new(:delayed_proprocessing => true)
tokenizer = StandardTokenizer.new
classifier.set_tokenizer(tokenizer)
# other useful tokenizers could be: PhraseTokenizer, KeyTermsTokenizer
stopper = StopWordRemover.new
classifier.add_preprocessor(stopper)
stemmer = KStemmer.new(:dictionary => 'english')
classifier.add_preprocessor(stemmer)
classifier.add_document(:id => 1, :text => "blah")
classifier.add_document(:id => 2, :text => "bleh", :class => :b)
classifer.set_class(:id => 1, :class => :a)
classifier.preprocess
# get model; training is automatically finalised
model = classifier.get_model
# do classification
puts model.classify(:text => "blah") # OUTPUT: a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment