Skip to content

Instantly share code, notes, and snippets.

@colinsurprenant
Created March 16, 2014 19:16
Show Gist options
  • Save colinsurprenant/9588338 to your computer and use it in GitHub Desktop.
Save colinsurprenant/9588338 to your computer and use it in GitHub Desktop.
Ruby encoding inconsistencies
# encoding: utf-8
puts "\nusing #{RUBY_DESCRIPTION}"
puts '"123"' + "\t\t=> " + "123".encoding.to_s
puts '"#{123}"' + "\t=> " + "#{123}".encoding.to_s
puts '"#{123.to_s}"' + "\t=> " + "#{123.to_s}".encoding.to_s
puts '123.to_s' + "\t=> " + 123.to_s.encoding.to_s
using ruby 1.9.3p545 (2014-02-24 revision 45159) [x86_64-darwin13.1.0]
"123" => UTF-8
"#{123}" => US-ASCII
"#{123.to_s}" => US-ASCII
123.to_s => US-ASCII
using ruby 2.0.0p451 (2014-02-24 revision 45167) [x86_64-darwin13.1.0]
"123" => UTF-8
"#{123}" => US-ASCII
"#{123.to_s}" => US-ASCII
123.to_s => US-ASCII
using ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-darwin13.0]
"123" => UTF-8
"#{123}" => US-ASCII
"#{123.to_s}" => US-ASCII
123.to_s => US-ASCII
using jruby 1.7.11 (1.9.3p392) 2014-02-24 86339bb on Java HotSpot(TM) 64-Bit Server VM 1.7.0_11-b21 [darwin-x86_64]
"123" => UTF-8
"#{123}" => UTF-8
"#{123.to_s}" => UTF-8
123.to_s => US-ASCII
using rubinius 2.1.1 (2.1.0 be67ed17 2013-10-18 JI) [x86_64-darwin12.5.0]
"123" => UTF-8
"#{123}" => US-ASCII
"#{123.to_s}" => US-ASCII
123.to_s => US-ASCII
using rubinius 2.2.6 (2.1.0 68d916a5 2014-03-10 JI) [x86_64-darwin13.1.0]
"123" => UTF-8
"#{123}" => US-ASCII
"#{123.to_s}" => US-ASCII
123.to_s => US-ASCII
@colinsurprenant
Copy link
Author

Thanks @headius I will, but generally speaking wouldn't it make more sense to honour the encoding setting when generating strings, in the case of Fixnum#to_s but I would argue for any "native" to_s?

@jorgelbg if you mean storage-wise, "123" US-ASCII or UTF-8 encoded will be 3 bytes in both cases. Unless I look at it from the wrong angle, its all about consistency. When correct encoding is necessary in your app, having the expected string encoding will avoid having to go into encoding verification/change/transcoding to uniformise your strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment