Skip to content

Instantly share code, notes, and snippets.

@jaimeiniesta
Created August 18, 2010 14:32
Show Gist options
  • Save jaimeiniesta/534922 to your computer and use it in GitHub Desktop.
Save jaimeiniesta/534922 to your computer and use it in GitHub Desktop.
# encoding: UTF-8
# Guess the charset encoding of a given input, by submitting it to vote to different guessers.
require 'rubygems'
require 'open-uri'
if RUBY_VERSION > "1.9"
# ruby encodings only work on 1.9
class Yukihiro
def guess_charset(str)
str.encoding.name
end
end
else
# charguess gem doesn't compile on 1.9
class Ernesto
require 'charguess'
def guess_charset(str)
CharGuess.guess(str)
end
end
end
class Guillermo
def guess_charset(str)
IO.popen("file --mime-encoding -",'w+') {|f| f.write str ; f.close_write ; f.read.strip.split.last }
end
end
class Jury
def initialize
@voters = [Guillermo.new, (RUBY_VERSION > "1.9" ? Yukihiro.new : Ernesto.new)]
end
def decide_charset(str)
@voters.each do |voter|
puts "#{voter.class.to_s} thinks this is #{voter.guess_charset(str)}"
end
end
end
puts "Ruby version: #{RUBY_VERSION}"
jury = Jury.new
%w(http://www.alazan.com http://www.seriesyonquis.com http://www.hola.com http://www.welcome2japan.cn http://www.ruby-lang.org).each do |url|
puts "\nVoting for the charset of #{url}"
jury.decide_charset(open(url).read)
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment