Skip to content

Instantly share code, notes, and snippets.

@matiskay
Created January 6, 2012 01:19
Show Gist options
  • Save matiskay/1568370 to your computer and use it in GitHub Desktop.
Save matiskay/1568370 to your computer and use it in GitHub Desktop.
Idea to get the spider works
def parse(self,response):
hxs = HtmlXPathSelector(response)
companies=hxs.select('//div[contains(@class,"rezultatPretrage")]/h2/a/@href').extract()
for url in companies:
yield Request(urljoin_rfc(get_base_url(response), url), callback=self.parseCompanyData)
nexturl=hxs.select('//div[@class="stranicePretrage"]/a/img[@alt="Next"]/..').select('@href').extract()
if nexturl:
yield Request(urljoin_rfc(get_base_url(response), nexturl[0]), callback=self.parse)
def parseCompanyData(self, response):
@bmanojlovic
Copy link

You are welcome :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment