Skip to content

Instantly share code, notes, and snippets.

@underhilllabs
Last active December 9, 2016 20:11
Show Gist options
  • Save underhilllabs/431a1c0a315f8e7962ab9a99eafa22c4 to your computer and use it in GitHub Desktop.
Save underhilllabs/431a1c0a315f8e7962ab9a99eafa22c4 to your computer and use it in GitHub Desktop.
Scrape Bill Files from Colorado General Assembly website
require 'open-uri'
require 'nokogiri'
require 'mechanize'
div
def get_bill_nums
doc = Nokogiri::HTML(open("http://leg.colorado.gov/bill-search"))
bill_nums = doc.css("div.field-name-field-bill-number div.field-items").text.split("\s")
bill_nums
end
bill_nums = get_bill_nums()
puts bill_nums
bill_nums.each do |bill_num|
agent = Mechanize.new
doc = Nokogiri::HTML(open("http://leg.colorado.gov/bills/#{bill_num}"))
bill_file_url = doc.css("div.recent-bill-text a")[0].attributes["href"].text
if !File.exists? "files/bills/#{bill_num}.pdf"
agent.get(bill_file_url).save "files/bills/#{bill_num}.pdf"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment