Skip to content

Instantly share code, notes, and snippets.

@CAFxX
Last active May 4, 2022 05:16
Show Gist options
  • Save CAFxX/0d15ce5430921cda86d93922a2696c69 to your computer and use it in GitHub Desktop.
Save CAFxX/0d15ce5430921cda86d93922a2696c69 to your computer and use it in GitHub Desktop.
Ragel machine definition for valid UTF-8 encodings
%%{
utf8 =
( 0..127 | 192..223 128..191 | 224..239 128..191 128..191 | 240..247 128..191 128..191 128..191 )
- ( 244 144..191 any any | 245..247 any any any ) # over U+10FFFF
- ( 0xC0..0xC1 any ) # overlong 2-byte encodings
- ( 224 128..159 any ) # overlong 3-byte encodings
- ( 240 128..143 any any ) # overlong 4-byte encodings
- ( 237 160..191 any ) # invalid utf-16 surrogate pairs
;
main := utf8* ;
}%%
@CAFxX
Copy link
Author

CAFxX commented May 2, 2022

test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment