Created
May 28, 2012 10:42
-
-
Save maerzbow/2818443 to your computer and use it in GitHub Desktop.
codepoints_vs_bytes.rb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
german_string = "muß grösser sein" | |
# to retrieve the encoding of a String | |
puts german_string.encoding.name # >> UTF-8 | |
# the character count | |
puts german_string.size # >> 16 | |
# the actual bytes of this UTF-8 string | |
puts german_string.bytesize # >> 18 | |
puts german_string.bytes { |byte| print "#{byte} " } # >> 109 117 195 159 32 103 114 195 182 115 115 101 114 32 115 101 105 110 | |
puts german_string.codepoints { |cp| print "#{cp} " } # >> 109 117 223 32 103 114 246 115 115 101 114 32 115 101 105 110 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment