mossheim/sclangTests.md

Last active March 1, 2017 05:18

Star (1) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/mossheim/3ae1c2771f06ce653b35aad017b89d6b.js"></script>
Save mossheim/3ae1c2771f06ce653b35aad017b89d6b to your computer and use it in GitHub Desktop.

Download ZIP

Regression tests for the SuperCollider lexer, parser, and interpreter.

Raw

sclangTests.md

sclang Tests

TL;DR: Regression tests for the SuperCollider lexer, parser, and interpreter.

Brian Heim, 28 jan 2017

Overview

This document proposes the form and high-level implementation details of a series of rigorous regression tests for the SuperCollider lexer, parser, and interpreter. Although these are treated as three separate components for the purposes of testing, testing one component through the compiled language naturally involves some testing of the other two.

Definitions

Lexer: essentially, the method yylex() in PyrLexer.cpp. The component that handles character-to-character understanding of an input text, identifying tokens for the parser.

Parser: the method yyparse(), generated by the Bison file lang11d. The component that handles token-to-token understanding of an input text, identifying parse nodes for the interpreter.

Interpreter (or compiler): the component that transforms the AST generated by the parser into an interpreted sequence of byte codes. The code for this component lies in PyrParseNode.cpp.

Approach

There are three general approaches to testing:

Full component stress-testing
Targeted sub-component stress-testing
Pseudorandom testing

Full component stress-testing involves testing the component by passing it input consisting of all possible strings of a given length. In addition to guaranteeing the performance of the component over a clearly bounded field, this stress-tests garbage handling.

Targeted sub-component stress-testing means testing a specialized behavior of the component, again with all possible strings of a given length, but limiting the alphabet and/or grammar so as to test a larger string length.

Pseudorandom testing means traversing a very large input space by passing a random (usually uniformly so) subset of the possible input strings. Imagine the string as being represented by a number represented in the base of the alphabet size, with a maximum value of (alphabet size)^(string length). For exmaple, if the alphabet size is 16, and the desired string length 6, a good pseudorandom test might involving starting with the string represented by 0x000000, then testing every "number" that is a multiple of a number coprime with the alphabet size, such as 11 or 13.

Limits

A reasonable number of inputs to test for any stress-test is (in the author's opinion) is 10-20 million. This results in the following (soft) limits on alphabet size, assuming there are no limitations on grammar (i.e., all combinations are to be tested):

String Length	10 million tests	20 million tests
1	10000000	20000000
2	3162	4472
3	215	271
4	56	67
5	25	29
6	15	16
7	10	11
8	7	8

Proposed Test Types

Lexer Tests

Alphabet: character literals in the range -128 to 127, excluding 0.

Full component test of strings up to 3 characters in length
Sub-component test of literal recognition (symbols, floats, radix floats, etc.) of strings 4+ characters in length. (By my reckoning, the longest relevant string worth testing is 7 characters: -1.1e-5)

Parser Tests

Alphabet: any symbol recognized by Bison, plus some duplicates to test more complex behavior. Specific alphabet TBD. This includes many of the 41 tokens explicitly defined at the top of lang11d, plus []{}();,, maybe more.

Full component tests of strings up to length 4
Sub-component tests??
Pseudorandom testing of strings up to length 10 (?)

Compiler tests

The compiler has no specific alphabet. Rather, these tests will reuse input spaces from the lexer and parser tests and surround them with { }.def.code to get the byte code array.

Ideas for Specific Test Implementations

Storage

In an output file, store the result of the regression test for comparison with pre-validated output.

Encoding

The storage format should include some sort of compression, since many strings in the lexer and parser are likely to be garbage and fail, producing nil. One idea is to just count the number of consecutive strings that produce the same result and encode that as a base-10 number at the end of the current output line. An output-comparing utility method would have to be written (and tested!) to properly decode this.

The current idea is to give the input string as a hex sequence (to make sure that non-printing characters are correctly shown) followed by the class of the result of interpretation or compilation. In the case of the lexer, this could be both the class of the object returned by .interpret and the result of .asString called on that object.

Strings should be encoded as strings of hex character values to ensure they do not interfere with delimiters, and so that non-printing characters are made visible.

Lexer testing notepad

For the main test, .interpreting the string in three ways:

first, the string alone;
second, the string prefixed by a known and easily identified expression;
third, the string suffixed by a known and easily identified expression.

Other tests do not need to be that rigorous.

Purpose: since we cannot directly test the output of the lexer without significant effort, we can use this to gain some insight into the lexer's behavior. Compare the results of these tests when run on a few test strings:

Test String	`str`	`"\\before;" ++ str`	`str ++ ";\\after"`
`"//"`	`nil`	`before`	`nil`
`"10"`	`10`	`10`	`after`
`"abc"`	`nil`	`nil`	`nil`
`"\\\'"`	(empty symbol)	(empty symbol)	(empty symbol)

Thoughts: could also suffix with "\after" to catch inoperative strings like " ", or surround with { } (but this gives less info. How many cases should we limit ourselves to?

Component tests: make little grammars to test radix notation, accidental notation, hex notation, exponential notation, and unholy mixtures of those. Testing should focus on that because the interactions among these notations is complicated.

Pseudorandom tests: maybe not needed for the lexer, but could lead to something interesting.

Error handling: some inputs will cause an error when interpreted, not lexed. This should be handled by surrounding the .interpret call with a try-catch. If the catch block is entered, set the result to some unique identifier like \error and use that as the recorded output.

Parser testing notepad

Tokens from Bison:


%token	NAME INTEGER SC_FLOAT ACCIDENTAL SYMBOL STRING ASCII PRIMITIVENAME CLASSNAME CURRYARG
%token  VAR ARG CLASSVAR SC_CONST
%token	NILOBJ TRUEOBJ FALSEOBJ
%token	PSEUDOVAR
%token  ELLIPSIS DOTDOT PIE BEGINCLOSEDFUNC
%token  BADTOKEN INTERPRET
%token  BEGINGENERATOR LEFTARROW WHILE
%left	':'
%right  '='
%left	BINOP KEYBINOP '-' '<' '>' '*' '+' '|' READWRITEVAR
%left	'.'
%right  '(backtick)'
%right  UMINUS

Should have at least 2 variable names and 2 real class names to test multiple declarations. And should also include punctuation characters not shown here.

Can probably test strings up to length 4, might want to do the same tests both raw and within {}?

Methods to use

Some important functions here are .interpret, .compile, and .def.code. What to use when is still kind of up in the air.

Author

mossheim commented Feb 19, 2017

a lot can be reused between these three sets of tests. the abstracted inputs are:

alphabet size
max string length

and also there should be methods that will test the string prefixed, postfixed, and enclosed with given input strings.

string should be represented in compressed format that appears uniform to the human eye. hex seems like a logical choice.

pseudocode for one lexer test:

alphabet = (every char value from -128 to 127 excluding 0)
minStringLength = 0
maxStringLength = 3
prefix = ""
suffix = ""
for len = minStringLength to maxStringLength
  testAllStringsWithInterpret(alphabet, len, prefix, suffix)
  // each of 'interpret', 'compile', and 'def.code' may have separate needs, best to keep them separate for now
  // for instance, we might want to also offer a 'separator' argument

pseudocode for testAllCombinationsWithInterpret:

testFile = new File

for each possible string:
    var result
    var testString = prefix++string++suffix
    try {
        result = testString.interpret;
    } {
        result = \runtime_error
    }
    testFile.write(formatForOutput(testString, result)) // puts the testString and result in the correct format. it might be best to keep some of this in the same function
     // or maybe they will all turn out to need the same function

compareToExpected(testFile, validatedFile) // compares each string to its expected value and fails if any result is different

testFile.die

and then write similar ones for compile and def.code

problem with the above--what if "for each possible string" is too complex of a loop to house in this one function? (i think it can be done iteratively though, just check for rollover/overflow)

important to be able to switch out parameters easily: each test + stringLength combo should be its own unit/file

other convenience ideas:

formalize the filenaming scheme
search function so that all outputs for a given string can be easily found across prefix/suffix combos
a function that will work horizontally and show you the results of adding various prefixes and suffixes to strings (mainly for testing out ideas for prefixes/suffixes)

Author

mossheim commented Feb 19, 2017

goal for tomorrow: assemble testAllStrings[WithX] and compareToExpected

Author

mossheim commented Feb 19, 2017 •

edited

Loading

the output file should also begin with the number of strings being tested

also put the prefix, suffix, and alphabet in the file. add an argument to testAllStrings for adding a unique ID to the filename so that different suffix/prefix test combos may be distinguished

(prefix, suffix, and alphabet should be encoded as hex strings for compatibility)

Author

mossheim commented Feb 19, 2017

add an explanatory line to each file to describe the encoding style. use HEAD and DATA to mark the starts of these two sections.

Author

mossheim commented Feb 19, 2017 •

edited

Loading

full format:

HEAD
alphabet size: <number of items in alphabet>
alphabet: <csv list of hex strings>
prefix: <hex string>
suffix: <hex string>
method: <one of interpret, compile, def.code, ...>
encoding: <description of encoding, does not need to be in any special format; the most human-readable thing here>
DATA
numStrings
encoded string & result 1
encoded string & result 2
...
encoded string & result N

note that this is loosely inspired by the format of a .wav soundfile's fmt and data chunks

Author

mossheim commented Feb 19, 2017 •

edited

Loading

First attempts:

Header generation/parsing
A few other convenience methods
Stub for testAllPossibleStrings

https://gist.github.com/brianlheim/d8eb73e7a124b4c55ec774fb566c934f

Update 1 (12:46 PM): fleshed out the main loop after much benchmarking, now getting into more tricky testing/interface challenges

Author

mossheim commented Feb 19, 2017

LOTS done on this today. Practically functional output although prefixing/suffixing has not yet been benchmarked or tested.
Will also want to confirm with a fine-toothed comb that this works, and test unrolled output against the "compressed" version.

Also, the main function is bulky and could definitely use another pass.

Next up is writing the validation/verification function to point out and explain differences between output files.

A good test to verify this test might be picking random strings out of the entire combination space, replacing them with obviously different ones, and see if the difference is correctly identified.

Some way to accommodate one file being longer than the other should be sought out. I.e. in the near future we may be able to add ^ back to the alphabet for command-line parsing, in which case the diff function should be able to understand that there are additions, not that everything is totally different after the nth line. Lexicographical/numerical ordering....

telephon commented Feb 20, 2017 •

edited

Loading

Thank you for all that work - it'll be tremendously helpful. I'm just reading the code, and have only really minor comments (nothing important).

https://gist.github.com/brianlheim/d8eb73e7a124b4c55ec774fb566c934f#file-sclangtestsimplementation-sc-L394

can be written as alphabet.collect { |element| this.stringToHexString(element) }.join(",");

https://gist.github.com/brianlheim/d8eb73e7a124b4c55ec774fb566c934f#file-sclangtestsimplementation-sc-L451

would be symmetric as:

headerString.keep(len);

It's a good decision to return an event, this strategy has proven to be the best for multiple return values.

In general, maybe instead of commenting sections, they could just be methods with that name.

E.g. instead of // READING ALPHABET it could be

*readAlphabet { … }

That might make it easier to test (sic). :)

Author

mossheim commented Feb 20, 2017

also TBD - should maybe clear the a...z variables before/after each execution to avoid side effects. (Restore s = Server.local when done)

Author

mossheim commented Feb 22, 2017

@telephon sorry just saw your comments! thanks for the ideas.

I like to avoid .join because it's a super slow method, but in this case it's not in any main performance-impacting loop so it would definitely improve readability.

I'm returning an IdentityDictionary now instead of Event, since all I really need is key-value pairs. Yes, it's quite slick!

Re: factoring out bits of the parser method, totally agree. I'll try to find a way to do it that doesn't involve a lot of argument passing. I wrote *parseHeader to be totally symmetric to *formatHeader, but since validation takes so much more effort I should move those subsections out.

Makes me glad to know someone is looking at this :)

Author

mossheim commented Feb 22, 2017

tried refactoring *parseHeader but it's like reinventing the iostream wheel (especially with line reading). I'm going to refactor that function as a file parser tomorrow. reduces code and opportunities for cracks, although now the methods are asymmetrical (one formats a string, the other reads from a file). might also be practical to change *formatHeader to a file-writing method *writeHeader.

Author

mossheim commented Feb 22, 2017

new code uploaded. still working on validation/diff method

Author

mossheim commented Feb 22, 2017

just began a separate file for testing the tests... i really want this to be solid. also allowed me to move my preexisting validation code into a more durable format

mossheim/sclangTests.md

sclang Tests

Overview

Definitions

Approach

Limits

Proposed Test Types

Lexer Tests

Parser Tests

Compiler tests

Ideas for Specific Test Implementations

Storage

Encoding

Lexer testing notepad

Parser testing notepad

Methods to use

mossheim commented Feb 19, 2017

mossheim commented Feb 19, 2017

mossheim commented Feb 19, 2017 • edited Loading

mossheim commented Feb 19, 2017

mossheim commented Feb 19, 2017 • edited Loading

mossheim commented Feb 19, 2017 • edited Loading

mossheim commented Feb 19, 2017

telephon commented Feb 20, 2017 • edited Loading

mossheim commented Feb 20, 2017

mossheim commented Feb 22, 2017

mossheim commented Feb 22, 2017

mossheim commented Feb 22, 2017

mossheim commented Feb 22, 2017

mossheim commented Feb 19, 2017 •

edited

Loading

mossheim commented Feb 19, 2017 •

edited

Loading

mossheim commented Feb 19, 2017 •

edited

Loading

telephon commented Feb 20, 2017 •

edited

Loading