Skip to content

Instantly share code, notes, and snippets.

@xatier
Last active August 29, 2015 14:03
Show Gist options
  • Save xatier/63bcdbe4b5ad7f93b0bf to your computer and use it in GitHub Desktop.
Save xatier/63bcdbe4b5ad7f93b0bf to your computer and use it in GitHub Desktop.
rip images from ねこ@ふたば <3
#!/usr/bin/env python3
########################################################################
#
# A ねこ@ふたば image crawler
#
# License: [GPL](http://www.gnu.org/copyleft/gpl.html)
#
# Usage: ./c.py > foo.html && chromium foo.html
#
########################################################################
import lxml.html
BASE = "http://may.2chan.net/27/"
main = lxml.html.parse(BASE + "futaba.htm")
pages = [ x.get('href') for x in main.xpath('/html/body/form[2]/a') if "res" in x.get('href') ]
img_list = []
for p in pages:
res = lxml.html.parse(BASE + p)
# first one
res.xpath('/html/body/form[2]/a[2]')[0].get('href')
# the rest
pics = res.xpath('/html/body/form[2]/table/tr/td[2]/a[3]')
for x in pics:
img_list.append(x.get('href'))
for img in img_list:
print("<img src=\"" + img + "\">")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment