Created
September 30, 2013 20:58
-
-
Save knowtheory/6770169 to your computer and use it in GitHub Desktop.
Very basic well-formedness test for english.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
documents = Document.all(:limit=>10) # get some documents | |
# use map to iterate over the documents & return the percentage | |
# of the document's words that are in the spell check dictionary | |
results = documents.map do |doc| | |
checked = Spellchecker.check(doc.combined_page_text) # check the text | |
correct = checked.select{ |entry| entry[:correct] } # get the correct words | |
percentage = correct.size.to_f / checked.size # find the percentage | |
puts "#{doc.id}: #{percentage} (#{correct.size} of #{checked.size})" # print out some status info. | |
[doc.id, {:percent => percentage, :correct => correct.size, :checked => checked.size}] # return the document's id, and it's percentage | |
end | |
percentages = Hash[results] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment