Skip to content

Instantly share code, notes, and snippets.

@danielharan
Created October 4, 2008 20:14
Show Gist options
  • Save danielharan/14801 to your computer and use it in GitHub Desktop.
Save danielharan/14801 to your computer and use it in GitHub Desktop.
require 'rubygems'
require 'mechanize'
postal_codes = File.open("postal_codes.txt").read.split("\n")
# randomize to make the pattern slightly harder to see in logs
postal_codes = postal_codes.sort_by {|e| rand(10_000)}
def filename(postcode)
postcode.sub(' ', '')
end
@agent = WWW::Mechanize.new do |a|
a.user_agent_alias = 'Mac Safari'
a.max_history = 1
end
@page = @agent.get("http://www.conservative.ca/EN/1051")
def scrape(postcode)
fsa, ldu = postcode.split(' ').collect {|e| e.downcase }
page = @agent.get("http://www.cbc.ca/news/canadavotes/myriding/postalcodes/#{postcode[0..0].downcase}/#{fsa}/#{ldu}.html")
File.open("pages/#{filename(postcode)}", "w") do |f|
f.puts page.body
end
end
postal_codes.each do |postcode|
begin
next if File.exists?("pages/#{filename(postcode)}")
scrape(postcode)
sleep(2 + (rand(3_000) / 1_000.0))
rescue
sleep 20
retry
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment