Created
December 1, 2008 00:48
-
-
Save kpumuk/30566 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Given a set of documents this method returns a list of tags associated with | |
# ordered by the ones occuring on the most documents. Tags that only appear o | |
# If the user supplies a specific tag to exclude it will not be included in t | |
def self.related_tags(docs,exclude = nil) | |
related = {} | |
docs.each_with_index do |doc, i| | |
break if i >= 20 # only consider the first 10 docs | |
doc.word_tags.each do |tag| # count num times each tag occurs | |
next if exclude && tag.name == exclude # if caller specified a tag to e | |
related[tag] ||= 0 | |
related[tag] += 1 | |
end | |
end | |
related = related.sort { |a,b| a[1]<=>b[1] }.reverse[0..24] | |
# now related is an array of arrays [tag,count] ordered by count | |
related = related.collect { |tag| tag[0] if tag[1] > 1 }.compact | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment