Skip to content

Instantly share code, notes, and snippets.

@mebusw
Last active May 15, 2017 13:02
Show Gist options
  • Save mebusw/90b4f4794b1ff0e7253b4438ed201ea3 to your computer and use it in GitHub Desktop.
Save mebusw/90b4f4794b1ff0e7253b4438ed201ea3 to your computer and use it in GitHub Desktop.
using scrapy to crawl http://books.toscrape.com/ interatively
import scrapy
class BookSpider(scrapy.Spider):
name = "books"
start_urls = ['http://books.toscrape.com/']
def parse(self, response):
for book in response.css('article.product_pod'):
name = book.xpath('./h3/a/@title').extract_first()
price = book.css('p.price_color::text').extract_first()
yield {
'name':name,
'price':price
}
for url in response.xpath('//ul[@class="pager"]/li[@class="next"]'):
u = url.xpath('./a/@href').extract_first()
yield scrapy.Request(response.urljoin(u), callback=self.parse)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment