Skip to content

Instantly share code, notes, and snippets.

@henare
Created August 19, 2012 02:08
Show Gist options
  • Save henare/3391000 to your computer and use it in GitHub Desktop.
Save henare/3391000 to your computer and use it in GitHub Desktop.
Scraper for NSW POEO Licences
require 'mechanize'
require 'date'
agent = Mechanize.new
base_url = 'http://www.environment.nsw.gov.au/prpoeoapp/'
first_page = agent.get base_url
form = first_page.form_with(name: 'form1')
form.radiobutton_with(value: 'optIssuedLicences').click
licence_search_page = form.submit
form = licence_search_page.form_with(name: 'form1')
form['ddlLicenceStatus'] = 2
results_page = form.submit(form.button_with(name: 'btnSeach'))
detail_page_links = results_page.links_with(href: /^Detail.aspx/)
detail_page_links.each do |l|
detail_page = l.click
record = {
type: detail_page.at('#_LicenceDetails1_lblLicenceAppText').inner_text,
number: detail_page.at('#_LicenceDetails1_lblLicenceNo').inner_text,
}
summary_table = detail_page.at('#_LicenceDetails1_divLicenceDetails').at(:table)
download_link = base_url + summary_table.at(:a).attr(:href)
summary_table.search(:tr).each do |r|
next if r.search(:td).count != 2
record_key = r.search(:td)[0].inner_text.strip.downcase.gsub(' ', '_').gsub(':', '')
# TODO: Does not handle multi-line values
record[record_key.to_sym] = r.search(:td)[1].inner_text.strip
end
p record
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment