Last active
August 29, 2015 14:02
-
-
Save abinoam/224cbffd5cae7f591b10 to your computer and use it in GitHub Desktop.
https://www.ruby-forum.com/topic/4980931 - Gist of an irb session messing around with encodings
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
file = File.open "acao_e_reacao_utf16_le.txt" | |
=> #<File:acao_e_reacao_utf16_le.txt> | |
file.methods.grep /enc/ | |
=> [:external_encoding, :internal_encoding, :set_encoding] | |
file.external_encoding | |
=> #<Encoding:UTF-8> | |
file.internal_encoding | |
=> nil | |
str = file.read | |
=> "A\u0000\xE7\u0000\xE3\u0000o\u0000 \u0000e\u0000 \u0000R\u0000e\u0000a\u0000\xE7\u0000\xE3\u0000o\u0000" | |
str.encoding | |
=> #<Encoding:UTF-8> | |
str.encode(Encoding::UTF_8) | |
=> "A\u0000\xE7\u0000\xE3\u0000o\u0000 \u0000e\u0000 \u0000R\u0000e\u0000a\u0000\xE7\u0000\xE3\u0000o\u0000" | |
str.encode(Encoding::UTF_8, Encoding::UTF_16) | |
Encoding::InvalidByteSequenceError: "A\x00" on UTF-16 | |
from (irb):8:in `encode' | |
from (irb):8 | |
from /home/abinoam/.rvm/rubies/ruby-2.1.1/bin/irb:11:in `<main>' | |
str.encode(Encoding::UTF_8, Encoding::UTF_16BE) | |
=> "䄀漀 攀 刀攀愀漀" | |
str.encode(Encoding::UTF_8, Encoding::UTF_16LE) | |
=> "Ação e Reação" # THE RIGHT SOURCE ENCODING | |
# Another approach | |
# Set the encoding on the opening of the file | |
file = File.open "acao_e_reacao_utf16_le.txt", "rb:utf-16le" | |
=> #<File:acao_e_reacao_utf16_le.txt> | |
file.external_encoding | |
=> #<Encoding:UTF-16LE> | |
file.internal_encoding | |
=> nil | |
str = file.read | |
=> "A\u00E7\u00E3o e Rea\u00E7\u00E3o" | |
str.encoding | |
=> #<Encoding:UTF-16LE> | |
str.encode(Encoding::UTF_8) | |
=> "Ação e Reação" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment