Created
February 29, 2012 20:10
-
-
Save mixonic/1944060 to your computer and use it in GitHub Desktop.
Rack::SpellCheck
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
=begin | |
Rack::SpellCheck - Spell check your HTML pages with Aspell, Nokogiri, and Rack. | |
This probably should be loaded in an initializer like so: | |
if Rails.env.development? | |
SpintoApp::Application.config.middleware.use Rack::SpellCheck | |
end | |
And you should add Nokogiri and Raspell to your Gemfile: | |
group :development do | |
gem 'nokogiri' | |
gem 'raspell' | |
end | |
It creates log entries like: | |
Started GET "/" for 127.0.0.1 at 2012-02-29 15:05:34 -0500 | |
Processing by Main::DashboardController#show as HTML | |
Rendered main/dashboard/show.erb within layouts/marketing (5.0ms) | |
Rendered layouts/_header.erb (31.3ms) | |
Rendered layouts/_footer.erb (0.6ms) | |
Completed 200 OK in 81ms (Views: 80.5ms) | |
SpellCheck [workflow]: work flow, work-flow, workfare, workforce, workable, forkful | |
SpellCheck [walkthrough]: walk through, walk-through, breakthrough, walkabout, Valkyrie, Walker | |
SpellCheck [frontmatter]: front matter, front-matter, frontward, antimatter, fronted, frontier | |
SpellCheck [subdomain]: sub domain, sub-domain, subhuman, subliming, subsuming, sideman | |
SpellCheck [dns]: Dons, dens, dins, dons, duns, DNA | |
SpellChecked in 0.244784 seconds. | |
=end | |
class Rack::SpellCheck | |
def initialize app | |
@app = app | |
@speller = Aspell.new("en_US") | |
@speller.suggestion_mode = Aspell::NORMAL | |
@misspellings = {} | |
@whitelist = %w{ | |
Matt Beale Spinto Spinto's spinto | |
matt beale www | |
css js scss CSS SCSS SASS CoffeeScript coffeescript | |
li td GitHub pre YAML CNAME | |
yoursubdomain | |
} | |
end | |
def call env | |
@app.call(env).tap do |response| | |
begin | |
if response[1]["Content-Type"] =~ /html/ | |
spell_check response[2].body | |
end | |
rescue StandardError => e | |
Rails.logger.warn "SpellCheck failed: #{e.message}" | |
end | |
end | |
end | |
def spell_check body | |
started_at = Time.now | |
dom = (if body =~ /<body/ | |
Nokogiri::HTML.parse( body ) | |
else | |
Nokogiri::HTML.fragment( body ) | |
end) | |
reported_words = [] | |
dom.xpath('//*').each do |node| | |
next unless node.text.present? | |
node.text. | |
# Strip out URLs. | |
gsub(%r{[a-zA-Z0-9\.:/]+\.(?:co|net|org)[a-zA-Z0-9\.:/?&%]+}, ''). | |
# For each word. | |
scan(%r{[A-Za-z\u2019'&;]+}) do |word| | |
# Change HTML escaped and UTF-8 apostrophes to single quotes. | |
word.gsub!(%r{\u2019|’}, "'") | |
key = word.downcase | |
next if @whitelist.include?(word) || reported_words.include?(key) | |
reported_words << key | |
check_word word | |
end | |
end | |
Rails.logger.info "SpellChecked in #{(Time.now-started_at).seconds} seconds." | |
end | |
def check_word word | |
key = word.downcase | |
if @misspellings.has_key?(key) && @misspellings[key][:suggestions] | |
log_misspelling word, @misspellings[key][:suggestions] | |
else | |
if @speller.check(word) | |
@misspellings[key] = { checked: true } | |
else | |
@misspellings[key] = { checked: true, suggestions: @speller.suggest(word) } | |
check_word word | |
end | |
end | |
end | |
def log_misspelling word, suggestions | |
Rails.logger.warn "SpellCheck [#{word}]: #{suggestions[0..5].join ', '}" | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment