Created
February 22, 2009 21:55
-
-
Save tommorris/68628 to your computer and use it in GitHub Desktop.
Ruby en subtype detector
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# a simple, ruby, regex-based EN subtype detector | |
# backstory: I spend a ridiculous amount of time editing text that's written in a | |
# variety of American and British English, to the point where it ends up screwing with my | |
# head and I get funny looks from my fellow Brits for writing American words. | |
# I wrote this so that I could pipe text I'm working on in Vim to it so I can see quickly | |
# which variant it's in. | |
def en_gb_or_us (text) | |
# compiled from https://wiki.ubuntu.com/EnglishTranslation/WordSubstitution | |
# and http://en.wikipedia.org/wiki/American_and_British_English_spelling_differences | |
brit_tests = [/\bcolour\b/, /\bfavourite\b/, /\bhonour\b/, /\barmour\b/, /\brumour\b/, /\benrolment\b/, /\bfulfil\b/, /\bskilful\b/, /\bcheque\b/, /\barse\b/, /\bmum\b/, /\bmam\b/, /\btitbit\b/, /\bpernickety\b/, /\baeroplane\b/, /\btheatre\b/, /\bgoitre\b/, /\blitre\b/, /\blustre\b/, /\bmitre\b/, /\bnitre\b/, /\breconnoitre\b/, /\bsaltpetre\b/, /\bspectre\b/, /\bcentre\b/, /\btitre\b/, /\bfibre\b/, /\bsabre\b/, /\bsombre\b/, /\bconnexion\b/, /\binflexion\b/, /\bdeflexion\b/, /\breflexion\b/, /\bgenuflexion\b/, /\bfoetal\b/, /\bfoetus\b/, /\banaemic\b/, /\banaemia\b/, /\banaesthesia\b/, /\banaesthetic\b/, /\bcaesium\b/, /\bdiarrhoea\b/, /\bdiarrhoeic\b/, /\bgynaecology\b/, /\bgynaecologist\b/, /\bhaemophilia\b/, /\bleukaemia\b/, /\boesophagus\b/, /\boestrogen\b/, /\bartefact\b/, /\bkerb\b/, /\bcypher\b/, /\bchequer\b/, /\bgaol\b/, /\bgaoler\b/, /\byoghurt\b/, /\bagendum\b/, /\badrenaline\b/, /\badaptor\b/, /\baluminium\b/, /\bdraught\b/, /\boenology\b/, /\bhomoeopathic\b/, /\bhomoeopathy\b/, /\bhomoeopath\b/, /\bcentimetre\b/, /\bnanometre\b/, /\btrouser[s]?\b/, /\bprise\b/, /\bjumper\b/, /\bpolo neck\b/, /\bdinner jacket\b/, /\bvapour\b/, /\bcourgette\b/, /\bwindscreen\b/] | |
amer_tests = [/\bcolor\b/, /\bfavorite\b/, /\bhonor\b/, /\barmor\b/, /\brumor\b/, /\benrollment\b/, /\bmom\b/, /\btidbit\b/, /\bpersnickety\b/, /\bairplane\b/, /\btheater\b/, /\bgoiter\b/, /\bliter\b/, /\bluster\b/, /\bmiter\b/, /\bniter\b/, /\breconnoiter\b/, /\bsaltpeter\b/, /\bspecter\b/, /\bcenter\b/, /\btiter\b/, /\bfiber\b/, /\bsaber\b/, /\bsomber\b/, /\bpederast\b/, /\bfetal\b/, /\bfetus\b/, /\banesthesia\b/, /\banesthetic\b/, /\bcesium\b/, /\bdiarrhea\b/, /\bdiarrheic\b/, /\bgynecology\b/, /\bgynecologist\b/, /\bleukemia\b/, /\besophagus\b/, /\bestrogen\b/, /\bartifact\b/, /\bgray\b/, /\bgantlet\b/, /\bdonut\b/, /\bomelet\b/, /\bmollusk\b/, /\benology\b/, /\bcentimeter\b/, /\bnanometer\b/, /\bdiaper[s]?\b/, /\bresume\b/, /\brésumé\b/, /\bsneakers\b/, /\bT\-boned\b/, /\btuxedo\b/, /\bturnpike\b/, /\bturtleneck\b/, /\bvapor\b/, /\bzucchini\b/, /\bZIP code\b/, /\bwindshield\b/] | |
rating = 0 | |
brit_tests.each do |i| | |
rating = rating + 1 if text =~ i | |
end | |
amer_tests.each do |i| | |
rating = rating - 1 if text =~ i | |
end | |
return "British English" if rating > 0 | |
return "American English" if rating < 0 | |
return "Indistinguishable" if rating == 0 || i.nil? | |
end | |
puts en_gb_or_us("This diaper armor is strong, so buy me a donut").to_s | |
puts en_gb_or_us("I'm going to visit the theatre tonight").to_s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment