Skip to content

Instantly share code, notes, and snippets.

@brendanc
Created November 6, 2012 04:15
Show Gist options
  • Select an option

  • Save brendanc/4022508 to your computer and use it in GitHub Desktop.

Select an option

Save brendanc/4022508 to your computer and use it in GitHub Desktop.
parse srr Thursday results to csv
require 'open-uri'
require 'nokogiri'
base_url = "http://srr.org"
doc = Nokogiri::HTML(open("http://www.srr.org/events/thursday-night-run/index.php"))
divisions = ["Men's Open", "Women's Open", "Men's Masters", "Women's Masters"]
results = []
doc.xpath('//a[starts-with(@href, "/events/thursday-night-run/")]').each do |node|
href = node['href']
if href.match(/\.htm$/)
date = href.split("/")[-1].gsub(".htm","")
doc2 = Nokogiri::HTML(open(base_url + href))
division = "not set"
doc2.xpath('//tr').each do |row|
if row.children[0] == nil
next
end
name = row.children[0].text
# skip weather or empty rows
if name.gsub(/\s+/, "") == "" || name.downcase.match('/^weather/')
next
end
#check if this is a division header, if so, set the division
if divisions.include?(name)
division = name
next
end
# if any of these nodes are nil it's not a results row so let's move on with our lives
if row.children[2] == nil || row.children[4] == nil || row.children[6] == nil
next
end
time = row.children[2].text
city = row.children[4].text
notes = row.children[6].text
results << "date: #{date}, name: #{name}, time: #{time}, city: #{city}, notes:#{notes}, division:#{division}"
end
end
end
puts results.reverse
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment