Last active
May 4, 2022 05:16
-
-
Save CAFxX/0d15ce5430921cda86d93922a2696c69 to your computer and use it in GitHub Desktop.
Ragel machine definition for valid UTF-8 encodings
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
%%{ | |
utf8 = | |
( 0..127 | 192..223 128..191 | 224..239 128..191 128..191 | 240..247 128..191 128..191 128..191 ) | |
- ( 244 144..191 any any | 245..247 any any any ) # over U+10FFFF | |
- ( 0xC0..0xC1 any ) # overlong 2-byte encodings | |
- ( 224 128..159 any ) # overlong 3-byte encodings | |
- ( 240 128..143 any any ) # overlong 4-byte encodings | |
- ( 237 160..191 any ) # invalid utf-16 surrogate pairs | |
; | |
main := utf8* ; | |
}%% |
Author
CAFxX
commented
May 2, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment