Skip to content

Instantly share code, notes, and snippets.

@SeijiEmery
Created February 24, 2016 21:24
Show Gist options
  • Select an option

  • Save SeijiEmery/63f9219cb6093550c98c to your computer and use it in GitHub Desktop.

Select an option

Save SeijiEmery/63f9219cb6093550c98c to your computer and use it in GitHub Desktop.
full tokenizer regex for c-like languages
matches + splits string, int, hex, and floating point literals, c-style identifiers,
and greedily matches all symbols (differentiates between '+', '+=', '++').
Also includes two operators that I wish most languages _did_ have (you can remove these),
||= and &&=, which would codify/simplify the lua/js null check foo = x || default (foo ||= default)
('[^']*'|"[^"]*"|[_a-zA-Z][_a-zA-Z0-9]*|0x[a-fA-F\d]+|-?\d+(?:.\d+(?:[eE]\d+)?)?|//|/\*|\*/|\|\|=?|&&=?|\+\+|\-\-|\.\.|==|\->|[\+\-\*/%^\|&<>]=|[\?#\[\]\(\){}:;,\.\|&\+\-\*/%^<>=])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment