Created
February 22, 2013 04:15
-
-
Save jbowles/5010687 to your computer and use it in GitHub Desktop.
A quick look at what can be done with treat
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'treat' | |
include Treat::Core::DSL | |
doc1 = document('http://en.wikipedia.org/wiki/List_of_best-selling_fiction_authors') | |
doc2 = document('http://en.wikipedia.org/wiki/List_of_best-selling_books') | |
[d1,d2].apply(:chunk, :segment, :tokenize) | |
#Check it! | |
doc1.sentences | |
doc1.sentences.count | |
doc1.sentences.first | |
doc1.words | |
doc1.tokens | |
doc1.phrases | |
doc1.phrases.first | |
# You can define your own phrases and sentences | |
phrase_1 = phrase('this is a phrase') | |
phrase_2 = phrase('this is another phrase') | |
# A deeper dive into complicated objects | |
s = sentence('This is a sentence, whith phrases in it!') | |
s.to_s | |
# Print tree as we go through the decomposition/construction | |
# Basically, only some constructs will be available or evalutated on the object | |
# at certain points of the decompisition of the sentence and construction of | |
# the ruby object | |
s.print_tree | |
# Tokenize before you apply :parse and :category | |
s.apply :tokenize | |
s.tokens | |
s.print_tree # should be same as tokens, words | |
s.tokens.each{|t| p t} # should be same as tree, words | |
s.words.each{|w| p t} # should be same as tree, tokens | |
s.apply :parse # Call out to JVM | |
s.print_tree # should look different now | |
s.phrases.each{|phrase| p phrase} | |
s.tokens.each{|phrase| p phrase[:tag]} | |
s.apply :category | |
s.print_tree | |
s.verb_count | |
s.noun_count | |
# NP = Noun Phrase, gets you all noun phrases | |
s.each_phrase_with_tag('NP') do |np_phrase| | |
puts np_phrase.to_s | |
end | |
# VP = Verb Phrase, gets you all verb phrases | |
s.each_phrase_with_tag('VP') do |vp_phrase| | |
puts vp_phrase.to_s | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment