Skip to content

Instantly share code, notes, and snippets.

@postmodern
Created January 12, 2011 04:42
Show Gist options
  • Save postmodern/775700 to your computer and use it in GitHub Desktop.
Save postmodern/775700 to your computer and use it in GitHub Desktop.
Uses Parslet to parse and sanitize obfuscated email addresses.
#!/usr/bin/env ruby
require 'parslet'
class EmailParser < Parslet::Parser
rule(:space) { match('\s').repeat(1) }
rule(:space?) { space.maybe }
rule(:dash?) { match('[_-]').maybe }
rule(:at) {
str('@') |
(dash? >> (str('at') | str('AT')) >> dash?)
}
rule(:dot) {
str('.') |
(dash? >> (str('dot') | str('DOT')) >> dash?)
}
rule(:word) { match('[a-z0-9]').repeat(1).as(:word) >> space? }
rule(:separator) { space? >> dot.as(:dot) >> space? | space }
rule(:words) { word >> (separator >> word).repeat }
rule(:email) {
(words >> space? >> at.as(:at) >> space? >> words).as(:email)
}
root(:email)
end
class EmailSanitizer < Parslet::Transform
rule(:dot => simple(:dot), :word => simple(:word)) { ".#{word}" }
rule(:at => simple(:at)) { '@' }
rule(:word => simple(:word)) { word }
rule(:email => sequence(:email)) { email.join }
end
parser = EmailParser.new
sanitizer = EmailSanitizer.new
unless ARGV[0]
STDERR.puts "usage: #{$0} \"EMAIL_ADDR\""
exit -1
end
puts sanitizer.apply(parser.parse(ARGV[0]))
@postmodern
Copy link
Author

$ ruby email_parser.rb "x DOT y . z AT example DOT com"
[email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment