Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save igaiga/23cbf727dcde2596919613a34ca17111 to your computer and use it in GitHub Desktop.
Save igaiga/23cbf727dcde2596919613a34ca17111 to your computer and use it in GitHub Desktop.
# encoding: utf-8
# Analyze Wikipedia access data older version
# https://dumps.wikimedia.org/other/pagecounts-raw/
require "cgi"
filename = "20120301-000000-ja.txt"
file = File.open(filename, "r:UTF-8")
list = []
while text = file.gets
begin
next unless text =~ /^ja/
data = text.split
h = {:title => CGI.unescape(data[1]), :count => data[-2]}
list.push h
rescue Exception => e
#p e
end
end
file.close
# count順にソート
result = list.sort_by do |i|
i[:count].to_i
end
# トップ20表示
result.reverse.first(20).each do |i|
puts i
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment