Skip to content

Instantly share code, notes, and snippets.

@signalpillar
Created July 18, 2013 07:11
Show Gist options
  • Select an option

  • Save signalpillar/6027294 to your computer and use it in GitHub Desktop.

Select an option

Save signalpillar/6027294 to your computer and use it in GitHub Desktop.
Idea of web-page scraper to pdf
# install python htmldoc pdftk
import os
from subprocess import Popen
pdfs = []
DEPTH = 20
for i in xrange(1, DEPTH):
p = Popen(['htmldoc', '--webpage', '-f', '%d.pdf' % i,
'http://www.djangobook.com/en/2.0/chapter%02d/' % i])
p.wait()
if os.path.exists('%d.pdf' % i):
pdfs.append('%d.pdf' % i)
if len(pdfs) > 0:
cmds = ['pdftk']
cmds.extend(pdfs)
cmds.extend(['cat', 'output', 'all.pdf'])
p = Popen(cmds)
p.wait()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment