Skip to content

Instantly share code, notes, and snippets.

@craigw
Created April 8, 2010 13:30
Show Gist options
  • Save craigw/360072 to your computer and use it in GitHub Desktop.
Save craigw/360072 to your computer and use it in GitHub Desktop.
# Get a list of all MPs as a CSV file
require 'rubygems'
require 'hpricot'
require 'open-uri'
doc = Hpricot(open('http://www.parliament.uk/mpslordsandoffices/mps_and_lords/alms.cfm').read)
mps = ((doc / "table").to_a[4] / "tr").to_a
mps.shift
line_format = "%s,%s,%s,%s,%s,%s"
puts line_format % %W( Name Party Constituency Email Website Biography )
mps.each do |mp|
name, consituency = (mp / "td text()").to_a
next if consituency.nil?
name, party = name.to_s.scan(/(.*) \((.*)\)/)[0]
options = {}
(mp / "td a").to_a.each{ |a|
case a.inner_html.to_s.strip
when /website/i
options[:website] = a['href']
when /bio/i
options[:bio] = a['href']
when /email/i
options[:email] = a['href']
end
}
puts line_format % [ name, party.upcase, consituency, *options.values_at(:email, :website, :bio) ].map{|a| '"' + a.to_s.strip + '"'}
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment