Skip to content

Instantly share code, notes, and snippets.

@mlc
Created January 5, 2013 01:57
Show Gist options
  • Save mlc/4459166 to your computer and use it in GitHub Desktop.
Save mlc/4459166 to your computer and use it in GitHub Desktop.
sha1 hashing all the locations in a GeoIP database.

to try these scripts, grab the free MaxMind database, unpack, and then edit the GeoLiteCity-Location.csv file, removing the first (copyright) line.

on my laptop, the script which merely parses the data, converts to UTF-8, and then re-outputs as CSV again took 22.7 seconds of CPU time, while the script which additionally computes a SHA1 hash of the City, State (or other region), and Country took 28.7 seconds — only 6 seconds (26%) more.

you need Ruby 1.9.x, which is pretty much standard these days.

#!/usr/bin/env ruby
require 'csv'
require 'digest/sha1'
CSV.open('locations-with-sha1.csv', "w:utf-8", :headers => true) do |out|
CSV.foreach('GeoLiteCity-Location.csv', :encoding => "iso-8859-1:utf-8", :headers => true) do |row|
out << row
end
end
#!/usr/bin/env ruby
require 'csv'
require 'digest/sha1'
CSV.open('locations-with-sha1.csv', "w:utf-8", :headers => true) do |out|
CSV.foreach('GeoLiteCity-Location.csv', :encoding => "iso-8859-1:utf-8", :headers => true) do |row|
str = [row["city"], row["region"], row["country"]].join(', ')
row["hash"] = Digest::SHA1.hexdigest(str)
out << row
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment