Skip to content

Instantly share code, notes, and snippets.

@dokipen
Created January 31, 2011 21:21
Show Gist options
  • Save dokipen/804842 to your computer and use it in GitHub Desktop.
Save dokipen/804842 to your computer and use it in GitHub Desktop.
Parse Hacker News front page.
class Update < R '/update'
def get
articles = []
doc = Hpricot(open('http://news.ycombinator.com/'))
(doc/'.subtext/..').each do |subtext|
article = subtext.previous_node
articles << {
:rank =>
article.at('.title').inner_html.strip,
:title =>
article.at('.title/a').inner_html.strip,
:link =>
article.at('.title/a')[:href],
:comments =>
Article.normalize_url(
subtext.at('a:last')[:href]),
:comment_count =>
subtext.at('a:last').inner_html[/\d+/].to_i,
:author =>
subtext.at('a').inner_html.strip,
:points =>
subtext.at('span').inner_html[/\d+/].to_i
}
end
if TOP_COMMENT
# [snip]
end
# GET Embedly Pro Preview metadata
urls = articles.collect do |a|
Article.normalize_url(a[:link])
end.reject do |a|
Preview.key_exists? a
end
if urls.size > 0
api = ::Embedly::API.new :key => EMBEDLY_KEY
api.preview(:urls => urls, :maxwidth => 200).
each_with_index do |preview, i|
Preview.save_preview urls[i], preview
end
end
Article.delete_all
Article.create articles
redirect R(Index)
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment