Skip to content

Instantly share code, notes, and snippets.

@imajes
Created July 3, 2012 16:41
Show Gist options
  • Save imajes/3040901 to your computer and use it in GitHub Desktop.
Save imajes/3040901 to your computer and use it in GitHub Desktop.
unless @_mGotData
# If the data starts with BOM, we know it is UTF
if data[0,3] == "\xEF\xBB\xBF":
# EF BB BF UTF-8 with BOM
@result = {"encoding"=> "UTF-8", "confidence"=> 1.0}
elsif data[0,4] == "\xFF\xFE\x00\x00":
# FF FE 00 00 UTF-32, little-endian BOM
@result = {"encoding"=> "UTF-32LE", "confidence"=> 1.0}
elsif data[0,4] == "\x00\x00\xFE\xFF":
# 00 00 FE FF UTF-32, big-endian BOM
@result = {"encoding"=> "UTF-32BE", "confidence"=> 1.0}
elsif data[0,4] == "\xFE\xFF\x00\x00":
# FE FF 00 00 UCS-4, unusual octet order BOM (3412)
@result = {"encoding"=> "X-ISO-10646-UCS-4-3412", "confidence"=> 1.0}
elsif data[0,4] == "\x00\x00\xFF\xFE":
# 00 00 FF FE UCS-4, unusual octet order BOM (2143)
@result = {"encoding"=> "X-ISO-10646-UCS-4-2143", "confidence"=> 1.0}
elsif data[0,4] == "\xFF\xFE":
# FF FE UTF-16, little endian BOM
@result = {"encoding"=> "UTF-16LE", "confidence"=> 1.0}
elsif data[0,2] == "\xFE\xFF":
# FE FF UTF-16, big endian BOM
@result = {"encoding"=> "UTF-16BE", "confidence"=> 1.0}
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment