Skip to content

Instantly share code, notes, and snippets.

@rolandcrosby
Created September 25, 2018 16:04
Show Gist options
  • Save rolandcrosby/17d42d00e161e053efbaf37e59017543 to your computer and use it in GitHub Desktop.
Save rolandcrosby/17d42d00e161e053efbaf37e59017543 to your computer and use it in GitHub Desktop.
get place names and addresses from eater articles
t = open('sausage.html').read.gsub('&', '&')
matches = t.scan(/(?:<h3>([^<]*)<\/h3>)|<strong>([^<]*)<\/strong>\s*\(([^)]*)\)/)
def addr(str)
str.sub!(/,\s+(no phone|\d{3}-\d{3}-\d{4}).*/, '')
last = str.split(', ')[-1]
return str if last == "NJ"
return str + ", NY" if ["Brooklyn", "Queens", "Bronx", "Staten Island"].include? last
return str + ", New York, NY"
end
grouping = nil
matches = matches.map do |match|
if match[0] != nil
grouping = match[0].split.map(&:capitalize).join(' ')
nil
else
[grouping, match[1], addr(match[2])]
end
end.compact
matches.each do |x|
puts x.join("\t")
end
@rolandcrosby
Copy link
Author

this was only tested on https://ny.eater.com/2016/12/20/14027496/best-sausages-nyc so who knows if it generalizes to other articles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment