Skip to content

Instantly share code, notes, and snippets.

@lrvick
Created March 1, 2012 03:54
Show Gist options
  • Save lrvick/1947166 to your computer and use it in GitHub Desktop.
Save lrvick/1947166 to your computer and use it in GitHub Desktop.
Scrapy spider to output hackernews titles
from scrapy.http import Request
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
class HackernewsSpider(BaseSpider):
name = 'hackernews'
allowed_domains = []
start_urls = ['http://news.ycombinator.com']
def parse(self, response):
if 'news.ycombinator.com' in response.url:
hxs = HtmlXPathSelector(response)
titles = sites = hxs.select('//td[@class="title"]//a/text()')
for title in titles:
print title.extract()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment