Created
June 24, 2020 06:45
-
-
Save Ddedalus/4a9a1ced85e1830412d886c6d784eb98 to your computer and use it in GitHub Desktop.
Capture scrapy crawler results to a Python list
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import scrapy | |
from scrapy import signals | |
from scrapy.signalmanager import dispatcher | |
class MySpider(scrapy.Spider): | |
... | |
# gather the results, see: https://stackoverflow.com/a/40240712 | |
nurseries = [] | |
def crawler_results(signal, sender, item, response, spider): | |
nurseries.append(item) | |
dispatcher.connect(crawler_results, signal=signals.item_passed) | |
# Run the Spider | |
process = CrawlerProcess() | |
process.crawl(MySpider) | |
process.start() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment