Skip to content

Instantly share code, notes, and snippets.

@arockwell
Created November 10, 2008 22:53
Show Gist options
  • Select an option

  • Save arockwell/23674 to your computer and use it in GitHub Desktop.

Select an option

Save arockwell/23674 to your computer and use it in GitHub Desktop.
require 'hpricot'
require 'open-uri'
module HpricotScraper
def self.get_div_contents(url, div)
doc = Hpricot(open(url))
result = (doc/"##{div}")
return result
end
def self.get_links(url)
doc = Hpricot(open(url))
links = (doc/"a")
results = Array.new
links.each do |link|
href = link[:href]
if (not href =~ /^http/ and not href =~ /\.pdf$/)
results << href
end
end
return results
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment