Skip to content

Instantly share code, notes, and snippets.

@mostlyfine
Created November 17, 2010 11:31
Show Gist options
  • Save mostlyfine/703290 to your computer and use it in GitHub Desktop.
Save mostlyfine/703290 to your computer and use it in GitHub Desktop.
libro news page parser
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'jcode'
$KCODE='u'
page_content = Struct.new('PageContent', :category, :title, :body, :updated_at)
content = Nokogiri::HTML.parse(open('http://www.libro.jp/news/').read)
contents = content.xpath("//div[@class='entry']").map do |entry|
page = page_content.new(
entry.xpath("p[@class='category_txt']").text.to_s,
entry.xpath("h2").text.to_s.gsub(/[\s ]+/,' '),
entry.xpath("div[@class='entry_body']/div[@class='text']").text.to_s.gsub(/[\s ]+/,' ').gsub(/<br>/,"\n"),
Time.now
)
day = /([0-9]+)月([0-9]+)日/.match(page.title.tr('0-9','0-9')).to_a
page.updated_at = Time.local(Time.now.year, day[1], day[2])
page
end
puts contents
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment