Skip to content

Instantly share code, notes, and snippets.

@n3rio
Created January 11, 2019 19:37
Show Gist options
  • Select an option

  • Save n3rio/f14d20a926338f1b094d302e1a226e97 to your computer and use it in GitHub Desktop.

Select an option

Save n3rio/f14d20a926338f1b094d302e1a226e97 to your computer and use it in GitHub Desktop.
Run for ever spyder, modifying start_requests() method. (private)
# Se coloca el inicio de la ejecucion en start_request()
class Foo(Spider):
name = 'foo'
allowed_domains = ['foo.com']
def start_requests(self):
while True:
data = self.coll.find({'status': 'unscraped'}).limit(5000)
if not data:
break
for row in data:
pin = row['pin']
url = 'http://foo.com/Pages/PIN-Results.aspx?PIN={}'.format(pin)
yield Request(url, meta={'pin': pin})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment