Skip to content

Instantly share code, notes, and snippets.

@xlab
Created October 25, 2012 16:40
Show Gist options
  • Save xlab/3953925 to your computer and use it in GitHub Desktop.
Save xlab/3953925 to your computer and use it in GitHub Desktop.
Habra comments parser
#!/usr/bin/env ruby
# encoding: utf-8
require 'nokogiri'
require 'open-uri'
require 'yaml'
###################################################
# ♥ http://habrahabr.ru/users/bupycnet/comments/
NICK = unless ARGV.empty? then "#{ARGV.pop}" else
puts "Использование: ./parser.rb nick"
exit
end
CACHE = "cache-#{NICK}.yaml"
###################################################
class Parser
attr_reader :nick
def initialize nick
@nick = nick
end
def get_document relative
Nokogiri::HTML(open("http://habrahabr.ru/#{relative}",
"User-Agent" => "Mozilla/5.0"))
end
def get_pages
get_document("users/#{@nick}/comments/page999/").css('#nav-pages li').max.content
end
def get_comments page
get_document("users/#{@nick}/comments/page#{page}/").css('.comment_item .message').map{|html| html.content.strip}
end
def get_all_comments
comments = []
1.upto(get_pages.to_i).each { |page|
comments << get_comments(page)
}
comments.flatten
end
end
def run
comments = []
# check for cache
unless File.exist? CACHE
File.open(CACHE, "w") do |file|
# do cache
comments = Parser.new(NICK).get_all_comments
file.puts YAML::dump(comments)
end
else
comments = YAML::load(open(CACHE, "r"))
end
puts "Пользователь: #{NICK}"
puts "Всего комментариев: #{comments.size}"
overall = comments.inject{ |sum, e| sum = e.length + sum.to_i}
puts "Общая длина: #{overall}"
puts "Средняя длина комментария: #{overall / comments.size}"
puts "Длины первых 10 самых длинных:"
p comments.map{|comment| comment.length }.sort.reverse[0..10]
end
run
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment