Created
March 16, 2014 19:16
-
-
Save colinsurprenant/9588338 to your computer and use it in GitHub Desktop.
Ruby encoding inconsistencies
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# encoding: utf-8 | |
puts "\nusing #{RUBY_DESCRIPTION}" | |
puts '"123"' + "\t\t=> " + "123".encoding.to_s | |
puts '"#{123}"' + "\t=> " + "#{123}".encoding.to_s | |
puts '"#{123.to_s}"' + "\t=> " + "#{123.to_s}".encoding.to_s | |
puts '123.to_s' + "\t=> " + 123.to_s.encoding.to_s |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
using ruby 1.9.3p545 (2014-02-24 revision 45159) [x86_64-darwin13.1.0] | |
"123" => UTF-8 | |
"#{123}" => US-ASCII | |
"#{123.to_s}" => US-ASCII | |
123.to_s => US-ASCII | |
using ruby 2.0.0p451 (2014-02-24 revision 45167) [x86_64-darwin13.1.0] | |
"123" => UTF-8 | |
"#{123}" => US-ASCII | |
"#{123.to_s}" => US-ASCII | |
123.to_s => US-ASCII | |
using ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-darwin13.0] | |
"123" => UTF-8 | |
"#{123}" => US-ASCII | |
"#{123.to_s}" => US-ASCII | |
123.to_s => US-ASCII | |
using jruby 1.7.11 (1.9.3p392) 2014-02-24 86339bb on Java HotSpot(TM) 64-Bit Server VM 1.7.0_11-b21 [darwin-x86_64] | |
"123" => UTF-8 | |
"#{123}" => UTF-8 | |
"#{123.to_s}" => UTF-8 | |
123.to_s => US-ASCII | |
using rubinius 2.1.1 (2.1.0 be67ed17 2013-10-18 JI) [x86_64-darwin12.5.0] | |
"123" => UTF-8 | |
"#{123}" => US-ASCII | |
"#{123.to_s}" => US-ASCII | |
123.to_s => US-ASCII | |
using rubinius 2.2.6 (2.1.0 68d916a5 2014-03-10 JI) [x86_64-darwin13.1.0] | |
"123" => UTF-8 | |
"#{123}" => US-ASCII | |
"#{123.to_s}" => US-ASCII | |
123.to_s => US-ASCII |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks @headius I will, but generally speaking wouldn't it make more sense to honour the encoding setting when generating strings, in the case of Fixnum#to_s but I would argue for any "native" to_s?
@jorgelbg if you mean storage-wise, "123" US-ASCII or UTF-8 encoded will be 3 bytes in both cases. Unless I look at it from the wrong angle, its all about consistency. When correct encoding is necessary in your app, having the expected string encoding will avoid having to go into encoding verification/change/transcoding to uniformise your strings.