Skip to content

Instantly share code, notes, and snippets.

@DemitryT
Created August 16, 2011 19:19
Show Gist options
  • Save DemitryT/1149906 to your computer and use it in GitHub Desktop.
Save DemitryT/1149906 to your computer and use it in GitHub Desktop.
Simple ruby program to crawl a specific page for links
require 'rubygems'
require 'anemone'
#!/usr/bin/env ruby
# To use the anemone gem to crawl website make sure these are installed:
# 1) Redis (>= 2.0.0)
# 2) MongoDB
# 3) TokyoCabinet
# 4) PStore
# crawl website for specific info for Hazzo marketing purposes
# REPLACE URL WITH THE ONE YOU WOULD LIKE TO CRAWL
URL = "http://www.groupon.com/denver/deals/glass-n-fire?utm_source=twitter&utm_medium=Social&utm_campaign=groupondenver"
Anemone.crawl(URL) do |anemone|
anemone.on_every_page do |page|
#{ |page| page.url =~ /http:\/\/gr.pn\/*/ }
if !page.url.to_s.include?("/status")
puts page.links
end
#if page.url =~ /GrouponDenver/
end
anemone.after_crawl do |pages|
puts "\n\RESULTS:"
puts "-----------------------------------------"
puts "Total number of links found: " +pages.uniq!.size.to_s
puts "\n"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment