Created
July 3, 2012 16:41
-
-
Save imajes/3040901 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
unless @_mGotData | |
# If the data starts with BOM, we know it is UTF | |
if data[0,3] == "\xEF\xBB\xBF": | |
# EF BB BF UTF-8 with BOM | |
@result = {"encoding"=> "UTF-8", "confidence"=> 1.0} | |
elsif data[0,4] == "\xFF\xFE\x00\x00": | |
# FF FE 00 00 UTF-32, little-endian BOM | |
@result = {"encoding"=> "UTF-32LE", "confidence"=> 1.0} | |
elsif data[0,4] == "\x00\x00\xFE\xFF": | |
# 00 00 FE FF UTF-32, big-endian BOM | |
@result = {"encoding"=> "UTF-32BE", "confidence"=> 1.0} | |
elsif data[0,4] == "\xFE\xFF\x00\x00": | |
# FE FF 00 00 UCS-4, unusual octet order BOM (3412) | |
@result = {"encoding"=> "X-ISO-10646-UCS-4-3412", "confidence"=> 1.0} | |
elsif data[0,4] == "\x00\x00\xFF\xFE": | |
# 00 00 FF FE UCS-4, unusual octet order BOM (2143) | |
@result = {"encoding"=> "X-ISO-10646-UCS-4-2143", "confidence"=> 1.0} | |
elsif data[0,4] == "\xFF\xFE": | |
# FF FE UTF-16, little endian BOM | |
@result = {"encoding"=> "UTF-16LE", "confidence"=> 1.0} | |
elsif data[0,2] == "\xFE\xFF": | |
# FE FF UTF-16, big endian BOM | |
@result = {"encoding"=> "UTF-16BE", "confidence"=> 1.0} | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment