Last active
December 5, 2015 13:07
-
-
Save dreikanter/30af1917e5356e0cc028 to your computer and use it in GitHub Desktop.
N-grams counter
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'pp' | |
WORDS = %w(correct horse battery staple) | |
LENGTH = 100 | |
SOURCE = (0..LENGTH).map { WORDS.sample } | |
puts "Source: #{SOURCE.join(', ')}" | |
### | |
NGRAM_LENGTH = 2 | |
TOP_LENGTH = 10 | |
### | |
ngram = -> (i) { SOURCE[i, NGRAM_LENGTH].join(' ') } | |
ngrams = (0..(LENGTH - NGRAM_LENGTH + 1)).map(&ngram) | |
counters = ngrams.each_with_object(Hash.new(0)) { |item, hash| hash[item] += 1 } | |
pp Hash[counters.sort_by { |_key, value| value }.last(TOP_LENGTH).reverse] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Source: battery, correct, correct, correct, staple, battery, correct, staple, staple, correct, correct, correct, correct, horse, staple, staple, correct, correct, staple, horse, staple, correct, horse, staple, staple, staple, battery, horse, correct, battery, horse, battery, correct, horse, correct, correct, horse, horse, correct, battery, battery, correct, correct, battery, horse, battery, correct, staple, battery, battery, battery, correct, battery, correct, battery, staple, horse, staple, correct, correct, correct, horse, battery, correct, horse, correct, correct, battery, battery, battery, correct, staple, correct, staple, battery, staple, staple, battery, battery, battery, staple, correct, horse, battery, horse, correct, battery, horse, battery, battery, horse, correct, horse, battery, horse, horse, horse, staple, correct, correct, staple | |
{"correct correct"=>12, | |
"battery correct"=>9, | |
"battery battery"=>8, | |
"correct horse"=>8, | |
"battery horse"=>7, | |
"correct staple"=>7, | |
"correct battery"=>7, | |
"staple correct"=>7, | |
"horse correct"=>6, | |
"horse battery"=>6} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment