Skip to content

Instantly share code, notes, and snippets.

@McKay1717
Forked from eladkehat/alexatopsites.rb
Last active October 10, 2015 14:36
Show Gist options
  • Save McKay1717/418493840fe8a4185d6d to your computer and use it in GitHub Desktop.
Save McKay1717/418493840fe8a4185d6d to your computer and use it in GitHub Desktop.
Get the list of top sites for a given country from Alexa
# Get the list of top sites for a given country from Alexa, found here:
# http://www.alexa.com/topsites/countries/<country code>
# and transform it into a nice text file, one domain per line.
# Supply the country code as the first argument. Defaults to US.
require 'rubygems'
require 'open-uri'
require 'nokogiri'
BASE_URL = "http://www.alexa.com/topsites/countries"
country = ARGV[0] || "FR"
def parse_page(url, outfile)
doc = Nokogiri::HTML(open(url))
domains = doc.css(".desc-paragraph").map(&:inner_text)
domains.each {|domain| outfile.puts domain }
end
outfile = File.open(ARGV[1] || "alexatop500.#{country}.txt", 'w')
0.upto(20) do |page|
url = BASE_URL + (page > 0 ? ";#{page}" : "" ) + "/#{country}"
puts "Parsing page #{page + 1} at #{url}"
parse_page(url, outfile)
end
outfile.close
puts "Written to #{outfile.path}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment