Skip to content

Instantly share code, notes, and snippets.

@erikcorry
Created July 1, 2021 11:05
Show Gist options
  • Save erikcorry/7fbd9ded27e0509a33ed861ede4d029a to your computer and use it in GitHub Desktop.
Save erikcorry/7fbd9ded27e0509a33ed861ede4d029a to your computer and use it in GitHub Desktop.
illegal-utf-8
check_illegal_utf_8 [244, 65, 48] // Low continuation bytes.
check_illegal_utf_8 [244, 244, 48] // High continuation bytes.
check_illegal_utf_8 [48, 244] // Missing continuation bytes.
check_illegal_utf_8 [0xc0, 0xdc] // Overlong encoding of backslash.
check_illegal_utf_8 [0xc1, 0xdf] // Overlong encoding of DEL.
check_illegal_utf_8 [0xe0, 0x9f, 0xbf] // Overlong encoding of character 0x7ff.
check_illegal_utf_8 [0xe0, 0x9f, 0xb9] // Overlong encoding of N'Ko exclamation mark.
check_illegal_utf_8 [0xf0, 0x82, 0x98, 0x83] // Overlong encoding of Unicode snowman.
check_illegal_utf_8 [0xed, 0xa0, 0x80] // 0xd800: First surrogate.
check_illegal_utf_8 [0xed, 0xbf, 0xbf] // 0xdfff: Last surrogate.
check_illegal_utf_8 [0xf4, 0x90, 0x80, 0x80] // 0x110000: First out-of-range value.
check_illegal_utf_8 [0xf5, 0x80, 0x80, 0x80] // All UTF-8 sequences starting with f5, f6 or f7 ...
check_illegal_utf_8 [0xf6, 0x80, 0x80, 0x80] // ... are out of the 0x10ffff range.
check_illegal_utf_8 [0xf7, 0x80, 0x80, 0x80]
check_illegal_utf_8 ['x', 0x80, 'y']
check_illegal_utf_8 ['x', 0xFF, 'y']
check_illegal_utf_8 ['x', 0xC0, 0x00, 'y']
check_illegal_utf_8 ['x', 0xC0, 0x80, 'y']
check_illegal_utf_8 ['x', 0xC0, 0x80, 'y', 0x80]
check_illegal_utf_8 ['x', 0xC0, 0x80, 'y', 0xC0]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment