Skip to content

Instantly share code, notes, and snippets.

@h5y1m141
Created August 20, 2012 05:40
Show Gist options
  • Save h5y1m141/3401357 to your computer and use it in GitHub Desktop.
Save h5y1m141/3401357 to your computer and use it in GitHub Desktop.
parse coffee meeting data
require 'rubygems'
require 'nokogiri'
require 'open-uri'
class Crawler
def run()
base_url = 'http://coffeemeeting.jp/hours/'
0.upto(10).each {|i|
count = (i*10)+1
begin
http = open(
base_url + count.to_s,
"User-Agent" => "My Agent",
"From" => "xxxx@mydomain",
"Referer" => "http://mydomain/"
)
rescue OpenURI::HTTPError => e
e.io.close
end
doc = Nokogiri::HTML(http)
entry_data = {
:hourdate =>doc.search('//p[@class="hourdate"]').text,
:hourspotname =>doc.search('//div[@class="hourspotname"]').text,
:hourlocation =>doc.search('//p[@class="hourlocation"]').text,
:meeting_owner => doc.search('//div[@id="left-sidebar"]/div/div/p').text
}
puts entry_data
sleep(2)
}
end
end
c = Crawler.new
c.run
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment