Skip to content

Instantly share code, notes, and snippets.

@lguardiola
Created November 8, 2009 19:18
Show Gist options
  • Save lguardiola/229444 to your computer and use it in GitHub Desktop.
Save lguardiola/229444 to your computer and use it in GitHub Desktop.
add n-grams capabilities to String class and calculate weights
class String
def ngram(level, *stopwords)
result = []
ranges = 0.upto(level-1).collect{|i|0..i}
data = self.gsub(/(\,|\.|\;)/,'').split.reject{|word| stopwords.flatten.include?(word)}
ranges.each do |range|
while range.max < data.size
result << data[range].join(' ')
range = range.min.succ..range.max.succ
end
end
result
end
def tag_could(level=1, *stopwords)
self.ngram(level, *stopwords).group_by do |ngram_element|
ngram_element
end.map do |ngram_element|
[ngram_element.first,ngram_element.last.size]
end.sort_by do |ngram_element|
ngram_element.last
end.reverse
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment