Skip to content

Instantly share code, notes, and snippets.

@rxw1
Last active December 30, 2015 20:49
Show Gist options
  • Save rxw1/7882930 to your computer and use it in GitHub Desktop.
Save rxw1/7882930 to your computer and use it in GitHub Desktop.
Create JSON of all books from it-ebooks.info.
#!/usr/bin/env ruby
# -*- encoding: utf-8 -*-
#
# Thu Nov 28 15:47:21 CET 2013
# 2013 (c) René Wilhelm <[email protected]>
#
# jq '.publisher + ": " + .name + " (" + .author + ")"' it-books.json
# Todo: Exit condition
# Implement a way to exit when done, after approx. 1300 repetitions.
require 'fileutils'
require 'json'
require 'mechanize'
require 'nokogiri'
require 'open-uri'
require 'ostruct'
require 'pry'
url = "http://it-ebooks.info/"
def getBook(url) # => JSON
p = Nokogiri::HTML(open(url))
a = p.css('*[@itemprop]').collect { |x| [x['itemprop'], x.text] }
h = a.inject(Hash.new{ |h,k| h[k]="" }){ |h,(k,v)| h[k] = v; h }
d = p.search('//tr[11]/td[2]/a').first.attributes['href'].text
h['download'] = d
h['url'] = url
OpenStruct.new h
end
def writeJSON(b)
File.open("it-books.json", "a") do |f|
f.write(JSON.pretty_generate(b.to_h))
end
end
def downloadPDF(b)
a = Mechanize.new
a.get(b.url) do |p|
pdf = p.link_with(:href => b.download).click
path = "#{b.publisher}/"
filename = "#{b.author.split(",").map {|x| x.split.last.capitalize}.join(" ")} #{b.datePublished} #{b.name}.pdf"
Dir.exists?(path) || FileUtils.mkdir_p(path)
File.open(path + filename, "w") do |f|
f.write(pdf.body)
end
end
end
n = 0
begin
loop do
n += 1
b = getBook(url + "book/#{n}")
writeJSON(b)
if b.description.downcase =~ ARGV[0] && b.datePublished.to_i >= ARGV[1]
downloadPDF(b)
puts "#{b.datePublished} [#{b.publisher}] #{b.name} (#{b.author})"
end
printf("Matching books: \t%s\r", n)
sleep rand(1..(Math::PI))
end
rescue
retry
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment