Skip to content

Instantly share code, notes, and snippets.

@o-sam-o
Created September 10, 2011 04:23
Show Gist options
  • Save o-sam-o/1207916 to your computer and use it in GitHub Desktop.
Save o-sam-o/1207916 to your computer and use it in GitHub Desktop.
Scrap Mackay City Council Development Applications
require 'nokogiri'
require 'open-uri'
require 'date'
require "awesome_print"
def clean_whitespace(a)
a.gsub(/[\r\n\t]/, ' ').squeeze(" ").strip
end
base_url = 'http://planning.mackay.qld.gov.au/masterview/Modules/Applicationmaster/'
url = "#{base_url}default.aspx?page=found&1=lastmonth" # add &6=T to see determined
doc = Nokogiri::HTML(open(url))
das = doc.xpath("//a[contains(@href,'&key=')]").collect do |approval_anchor|
approval_link = "#{base_url}#{approval_anchor['href']}"
approval_page = Nokogiri::HTML(open(approval_link))
page_info = {}
page_info[:council_reference] = $1 if clean_whitespace(approval_page.at_css('.ControlHeader').inner_text) =~ /([A-Z]+ - \d+ - \d+)/
page_info[:info_url] = approval_link
page_info[:description] = $1 if approval_page.at_css('#lblDetails').inner_text.strip =~ /Description: (.+)Submitted:/
page_info[:date_received] = $1 if approval_page.at_css('#lblDetails').inner_text.strip =~ /Submitted: (.+)/
page_info[:address] = clean_whitespace(approval_page.at_css('#lblProp').inner_text)
page_info[:date_scraped] = Date.today.to_s
page_info[:comment_url] = approval_page.at_css('.ControlContent').at_xpath("//a[contains(@href, 'mailto:')]")['href']
page_info
end
ap das
p 'Done'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment