Skip to content

Instantly share code, notes, and snippets.

@danielharan
Created September 24, 2008 20:05
Show Gist options
  • Save danielharan/12647 to your computer and use it in GitHub Desktop.
Save danielharan/12647 to your computer and use it in GitHub Desktop.
require 'rubygems'
require 'mechanize'
postal_codes = File.open("postal_codes.txt").read.split("\n")
# randomize to make the pattern slightly harder to see in logs
postal_codes = postal_codes.sort_by {|e| rand(10_000)}
@agent = WWW::Mechanize.new do |a|
a.user_agent_alias = 'Mac Safari'
a.max_history = 1
end
@page = @agent.get("http://anonymouse.org/cgi-bin/anon-www.cgi/http://www.elections.ca/home.asp")
def scrape(postcode)
lookup = @page.forms.name("POSTAL").first
lookup.fields.name("PC").value = postcode
search_results = @agent.submit(lookup)
# write file
File.open("pages/#{postcode}", "w") do |f|
# ED= is in the URL so we save it here.
# also saving the file in case the postal code corresponds to more than one EDID
f.puts "<!-- #{search_results.uri} -->"
f.puts search_results.body
end
end
postal_codes.each do |postcode|
postcode.sub! ' ', '' # done in javascript on the site, "A1A1A1" is OK, "A1A 1A1" is NOT
next if File.exists?("pages/#{postcode}")
scrape(postcode)
sleep(1 + (rand(4_000) / 1_000.0)) # pause 1-5 seconds
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment