Skip to content

Instantly share code, notes, and snippets.

@wjlafrance
Created December 12, 2011 15:48
Show Gist options
  • Save wjlafrance/1467957 to your computer and use it in GitHub Desktop.
Save wjlafrance/1467957 to your computer and use it in GitHub Desktop.
# Crawl WoW forums thread for all characters who posted, and their realms
# Doesn't uniquify characters
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'json'
thread_url = "http://us.battle.net/wow/en/forum/topic/3657428962"
regex = /\/wow\/en\/character\/(.+)\/(.+)\//
doc = Nokogiri::HTML(open(thread_url))
pages = doc.xpath("//ul[@class='ui-pagination']/li/a").map { |link| /([0-9]+)/.match(link.attributes['href'].value).captures[0].to_i }.max
puts "There are #{pages} pages in this thread."
realms = Hash.new(Array.new)
(1..pages).each do |page|
puts "Loading page #{page}"
doc = Nokogiri::HTML(open("#{thread_url}?page=#{page}"))
characters = doc.xpath("//div[@class='avatar-interior']/a").map {|user_link| regex.match(user_link.attributes['href'].value).captures }
characters.each { |pair| realms[pair[0]] = realms[pair[0]] + [pair[1]] }
end
puts realms.to_json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment