Skip to content

Instantly share code, notes, and snippets.

View masudr4n4's full-sized avatar
🏠
Working from home

Masud Rana masudr4n4

🏠
Working from home
View GitHub Profile
@1060460048
1060460048 / imdb_details_page_spider.py
Created August 7, 2016 13:47 — forked from premit/imdb_details_page_spider.py
Scrapy reference: Crawling scraped links & next pagination
'''
Spider for IMDb
- Retrieve most popular movies & TV series with rating of 8.0 and above that have at least 5 award nominations
- Crawl next pages recursively
- Follow the details pages of scraped films to retrieve more information of each film
'''
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector