Skip to content

Instantly share code, notes, and snippets.

@allolex
Created September 11, 2012 08:09
Show Gist options
  • Select an option

  • Save allolex/3696820 to your computer and use it in GitHub Desktop.

Select an option

Save allolex/3696820 to your computer and use it in GitHub Desktop.
An example of a tiny, modular and cleanly-written screen scraper
require 'mechanize'
module Scraper
class Tumblr
attr_accessor :fake_browser, :results
def initialize url
@fake_browser = Mechanize.new { |browser|
browser.user_agent_alias = 'Mac Safari'
}
@results = Array.new
@fake_browser.get(url) do |page|
posts = page.parser.css('article.post')
posts.each do |content|
@results << content.text
end
end
end
end
class SomeRandomDataSource
# Some other code specific to some other source
end
end
my_scraper = Scraper::Tumblr.new("http://allolex.net")
puts my_scraper.results
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment