Skip to content

Instantly share code, notes, and snippets.

@techbelly
Created February 15, 2011 00:34
Show Gist options
  • Save techbelly/826881 to your computer and use it in GitHub Desktop.
Save techbelly/826881 to your computer and use it in GitHub Desktop.
Scraper syntax
#!/usr/bin/env ruby
require 'gouge'
Gouge::Scraper.construct "BBCNewsHome"do
load "http://www.bbc.co.uk/news/"
stories = make_hash('//a[@class="story"]','@href','text()')
stories.each do |h,t|
puts h,t
if t =~ /.*'[^']+'.*/
now_scrape "BBCNewsPage",h
end
end
end
Gouge::Scraper.construct "BBCNewsPage" do
load "http://www.bbc.co.uk/%s"
title = first '//h1[@class="story-header"]'
paragraph = first '//p[@class="introduction"]'
puts [self.url,title,paragraph].inspect
end
BBCNewsHome.create!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment