Skip to content

Instantly share code, notes, and snippets.

@venj
Created February 11, 2012 05:04
Show Gist options
  • Save venj/1796574 to your computer and use it in GitHub Desktop.
Save venj/1796574 to your computer and use it in GitHub Desktop.
Fetch download links from epubshare.com
#!/usr/bin/env ruby
require "rubygems"
require "open-uri"
require "hpricot"
website = "http://www.epubshare.com"
base_uri = "#{website}/page/"
mainpage_index = 1
file = File.open("links.txt", "w+")
while true
puts "Opening #{base_uri}#{mainpage_index}/..."
html = open("#{base_uri}#{mainpage_index}/").read
maindoc = Hpricot(html)
entries = maindoc.search("//h2[@class='entry-title']/a")
if entries.size == 0
puts "Finished."
return 0
end
entries.each do |entry|
page_uri = entry.attributes["href"]
#puts entry.inner_html
file.puts entry.inner_html
begin
puts "Opening #{page_uri}..."
page_doc = Hpricot(open(page_uri).read)
rescue Exception => e
puts "Unknow error..."
break
end
filename_base = File.basename(page_uri)
puts "Processing #{filename_base} page..."
page_doc.search("//div[@class='entry-content']/p/a").each do |a|
link = a.attributes["href"]
unless link.index("www.amazon.com") || link.index("ecx.images-amazon.com") || link.index("www.epubshare.com")
#puts "\t" + link
file.puts "\t" + link
end
end
end
file.flush
mainpage_index += 1
end
file.close
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment