Skip to content

Instantly share code, notes, and snippets.

@dkincaid
Created December 29, 2011 23:08
Show Gist options
  • Save dkincaid/1536626 to your computer and use it in GitHub Desktop.
Save dkincaid/1536626 to your computer and use it in GitHub Desktop.
(defn standard-tokenizer [text]
"Uses the Lucene StandardTokenizer to tokenize the given text. Returns a vector containing
the tokens."
(let [analyzer (StandardAnalyzer. Version/LUCENE_31)
tokenstream (.tokenStream analyzer "field" (StringReader. text))
termatt (.addAttribute tokenstream TermAttribute)
terms []]
(while (.incrementToken tokenstream)
(print (.term termatt)))))
@ordnungswidrig
Copy link

(defn standard-tokenizer [text]
"Uses the Lucene StandardTokenizer to tokenize the given text. Returns a vector containing
the tokens."
(let [analyzer (StandardAnalyzer. Version/LUCENE_31)
tokenstream (.tokenStream analyzer "field" (StringReader. text))
termatt (.addAttribute tokenstream TermAttribute)
terms (atom [])](while %28.incrementToken tokenstream%29
%28swap! terms conj %28.term termatt%29%29))
@terms)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment