Glyph | Unicode codepoint | UTF-8 code units | UTF-8 bytes | UTF-16 code units | UTF-16LE bytes |
---|---|---|---|---|---|
B | U+0042 |
1 | 0x42 |
1 | 0x42 0x00 |
ÿ | U+00FF |
2 | 0xC3 0xBF |
1 | 0xFF 0x00 |
☃ | U+2603 |
3 | 0xE2 0x98 0x83 |
1 | 0x03 0x26 |
💩 | U+1F4A9 |
4 | 0xF0 0x9F 0x92 0xA9 |
2 | 0x3D 0xD8 0xA9 0xDC |
Last active
October 8, 2022 13:02
-
-
Save Aldaviva/f8980d70eb91dd16426333e73b04bacb to your computer and use it in GitHub Desktop.
Examples of Unicode codepoints with different UTF-8 and UTF-16 byte counts. Try pasting these into your program to see if it can handle multi-byte characters.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment