Blog 2025/4/9
<- previous | index | next ->
Symbolic expressions are a simple syntax for representing trees. You may recognize them as the core of the syntax of Scheme and Common Lisp.
This parser starts with a trick from Peter Norvig,
where each (
and )
are surrounded with an extra space, allowing you to simply .split()
the text into tokens.
However, that approach breaks on any string literals which include spaces.
We can fix this by using a string literal regex (see my previous post)
to first break the text up into string and non-string chunks, then apply Norvig's .split()
to the non-string chunks.
Throw in a little recursion to catch any unbalanced parens, and we're done!
Demos:
$ cat exprs1.txt
(statement (return (string "I said \"hello\" to the cat")))
$ ./parse_exprs.py exprs1.txt
['statement', ['return', ['string', '"I said \\"hello\\" to the cat"']]]
$ cat exprs2.txt
(vardecl (type pointer (type char)) (name message) (value (string "I said \"hello\" to the cat")))
$ ./parse_exprs.py exprs2.txt
['vardecl',
['type', 'pointer', ['type', 'char']],
['name', 'message'],
['value', ['string', '"I said \\"hello\\" to the cat"']]]