-
-
Save colinsurprenant/9588338 to your computer and use it in GitHub Desktop.
# encoding: utf-8 | |
puts "\nusing #{RUBY_DESCRIPTION}" | |
puts '"123"' + "\t\t=> " + "123".encoding.to_s | |
puts '"#{123}"' + "\t=> " + "#{123}".encoding.to_s | |
puts '"#{123.to_s}"' + "\t=> " + "#{123.to_s}".encoding.to_s | |
puts '123.to_s' + "\t=> " + 123.to_s.encoding.to_s |
using ruby 1.9.3p545 (2014-02-24 revision 45159) [x86_64-darwin13.1.0] | |
"123" => UTF-8 | |
"#{123}" => US-ASCII | |
"#{123.to_s}" => US-ASCII | |
123.to_s => US-ASCII | |
using ruby 2.0.0p451 (2014-02-24 revision 45167) [x86_64-darwin13.1.0] | |
"123" => UTF-8 | |
"#{123}" => US-ASCII | |
"#{123.to_s}" => US-ASCII | |
123.to_s => US-ASCII | |
using ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-darwin13.0] | |
"123" => UTF-8 | |
"#{123}" => US-ASCII | |
"#{123.to_s}" => US-ASCII | |
123.to_s => US-ASCII | |
using jruby 1.7.11 (1.9.3p392) 2014-02-24 86339bb on Java HotSpot(TM) 64-Bit Server VM 1.7.0_11-b21 [darwin-x86_64] | |
"123" => UTF-8 | |
"#{123}" => UTF-8 | |
"#{123.to_s}" => UTF-8 | |
123.to_s => US-ASCII | |
using rubinius 2.1.1 (2.1.0 be67ed17 2013-10-18 JI) [x86_64-darwin12.5.0] | |
"123" => UTF-8 | |
"#{123}" => US-ASCII | |
"#{123.to_s}" => US-ASCII | |
123.to_s => US-ASCII | |
using rubinius 2.2.6 (2.1.0 68d916a5 2014-03-10 JI) [x86_64-darwin13.1.0] | |
"123" => UTF-8 | |
"#{123}" => US-ASCII | |
"#{123.to_s}" => US-ASCII | |
123.to_s => US-ASCII |
Agreed! but in the case of Fixnum, what would the gain of respecting the encoding be? Basically to represent numbers ASCII, it's enough.
This would be worth filing as a JRuby issue, at least for us to investigate why we differ. My guess is that MRI is more aggressive in normalizing encodings to US-ASCII when combining multiple 7-bit strings together.
Thanks @headius I will, but generally speaking wouldn't it make more sense to honour the encoding setting when generating strings, in the case of Fixnum#to_s but I would argue for any "native" to_s?
@jorgelbg if you mean storage-wise, "123" US-ASCII or UTF-8 encoded will be 3 bytes in both cases. Unless I look at it from the wrong angle, its all about consistency. When correct encoding is necessary in your app, having the expected string encoding will avoid having to go into encoding verification/change/transcoding to uniformise your strings.
Normally, Ruby honours script
#encoding
in string literals. Why string literal with interpolation does not respect this? I would say JRuby does a better job at it.I believe it would make sense for
Fixnum#to_s
to honour the string encoding too?