We're writing support for Neon language into PhpStorm and we want to have it heavily tested with automated unit tests. First step when processing any programming language is a lexical analyzer or shortly lexer, which takes source code and splits into individual words of the programming language, which are called tokens (actually, also symbols, punctuation and whitespace count as tokens).
Because parsing strings manually is tedious and boring, clever people made tools to help us a little bit. Flex is de facto standard language for writing lexers, but we'll use its port to Java called JFlex. In flex files you describe patterns for several types of tokens and associate a piece of code with each. Have a look at flex file for very simple properties configuration language and refer to many tutorials on how to write a JFlex lexer (e.g. this).
You'll need to compile the .flex
file into a java class using JFlex program. Ideally you can use JFlex support plugin which integrates it into building process in IDEA.
For more details, check Implementing a Lexer in IDEA documentation or just have a look at existing lexers: 1 2.
Unit tests for lexers are actually pretty simple. You can just throw a piece of code into it and check if it split the words correctly and if it recognized the types as well.
There is a good example again in properties language, or our neon example which tries to be little bit more clean ;)
Note that code samples for lexers are usually pretty small and therefore the easiest way is to put them all into one big file. We wanted to have each case in a separate file, but that is inconvenient (though, it's useful with parser tests).
Run the tests quickly in IDEA by Ctrl+Shift+F10
or right-click and Run LexerTest.java
. If you used @Test
annotations for test methods, IDEA will recognize it as a test case and execute it properly.