Created
March 12, 2013 23:28
-
-
Save radiospiel/5148046 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class Nokogiri::HTML::Document | |
def meta_encoding | |
content_type = css("meta[http-equiv=content-type]").each do |meta| | |
break meta.attribute("content").value | |
end | |
return unless content_type | |
content_type.split("; ").each do |part| | |
next unless part =~ /^charset=(.*)/ | |
return $1 | |
end | |
nil | |
end | |
end | |
module Nokogiri::HTML | |
def self.with_meta_encoding(data) | |
doc = Nokogiri.HTML(data) | |
meta_encoding = doc.meta_encoding | |
return doc unless meta_encoding && doc.encoding != meta_encoding | |
# try to reread with meta_encoding | |
doc2 = Nokogiri.HTML(data, nil, meta_encoding) | |
return doc2 if doc2.encoding == meta_encoding | |
# rereading failed, return original document | |
doc | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment