Skip to content

Instantly share code, notes, and snippets.

@csakis
Forked from stummjr/splash-spider.py
Created June 6, 2018 16:22
Show Gist options
  • Save csakis/d99c7eecb3c55f8deeb42ef6727e8a07 to your computer and use it in GitHub Desktop.
Save csakis/d99c7eecb3c55f8deeb42ef6727e8a07 to your computer and use it in GitHub Desktop.
Scrapy + Splash example
import scrapy
# this example needs the scrapyjs package: pip install scrapyjs
# it also needs a splash instance running in your env or on Scrapy Cloud (https://github.com/scrapinghub/splash)
class SplashSpider(scrapy.Spider):
name = 'splash-spider'
download_delay = 3
def start_requests(self):
yield scrapy.Request(
'http://quotes.toscrape.com/js', self.parse,
meta={
'splash': {
'endpoint': 'render.html',
}
}
)
def parse(self, response):
print response.body
for quote in response.css('.quote'):
yield {
'text': quote.css('span::text').extract_first(),
'author': quote.css('small::text').extract_first(),
'tags': quote.css('.tags a::text').extract(),
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment