Skip to content

Instantly share code, notes, and snippets.

@brendanc
Last active October 12, 2015 03:37
Show Gist options
  • Select an option

  • Save brendanc/3964847 to your computer and use it in GitHub Desktop.

Select an option

Save brendanc/3964847 to your computer and use it in GitHub Desktop.
Scrape srr.org for Thursday night run results
require 'open-uri'
require 'nokogiri'
# This is the original parsing script that inspired this app.
# Left here for reference and debug/troubleshooting.
#
def crawl(doc, base_url, name_to_match)
base_url = "http://srr.org"
results = []
doc.xpath('//a[starts-with(@href, "/events/thursday-night-run/")]').each do |node|
href = node['href']
if href.match(/\.html?$/)
puts href
date = href.split("/")[-1].sub(/\.html?$/,"")
doc2 = Nokogiri::HTML(open(base_url + href))
doc2.xpath('//tr').each do |row|
name = row.children[0] ? row.children[0].text : ""
if name.match(/#{name_to_match}/)
results << name + " on " + date + " ran " + row.children[2]
end
end
end
end
puts results.reverse
end
base_url = "http://srr.org"
name_to_match = "[insert name here]"
doc2013 = Nokogiri::HTML(open(base_url + "/events/thursday-night-run/index.php"))
doc2012 = Nokogiri::HTML(open(base_url + "/events/thursday-night-run/2012/index.php"))
crawl(doc2012,base_url,name_to_match)
crawl(doc2013,base_url,name_to_match)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment