Skip to content

Instantly share code, notes, and snippets.

@chrisdone
Last active February 9, 2022 11:50
Show Gist options
  • Save chrisdone/fd6c6f6a8c5b5d4d3c3f91289343629f to your computer and use it in GitHub Desktop.
Save chrisdone/fd6c6f6a8c5b5d4d3c3f91289343629f to your computer and use it in GitHub Desktop.
Good parser messages with Parsec

Intro

I've been working on a parser for a Haskell-like syntax called Duet. In the implementation I've taken particular care to make an awesome tokenizer and parser that is super helpful to learners.

Jasper Van der Jeugt made a talk about producing good error messages recently, which coincides nicely with my parallel work on this. So I thought I'd also share my results. I don't have time to document in detail my approach, as I'm using that spare time to work on this project, but I can show off some examples.

Example error messages

I'll list some example inputs and parse errors, and compare with GHC's parsing of Haskell:

x =
unexpected end of input
expecting ‘case’, lambda expression (e.g. \x -> x), if expression
(e.g. ‘if p then x else y’), infix expression (e.g. x * y), function
expression, variable (e.g. ‘foo’, ‘id’, etc.), character (e.g. 'a'),
string (e.g. "a"), integer (e.g. 42, 123), decimal (e.g. 42, 123) or
constructor (e.g. Just)

GHC: parse error (possibly incorrect indentation or mismatched brackets)

The above demonstrates the standard Parsec "unexpected/expected" error message, that builds up a set of expectations based on your parser.

x = if
unexpected end of input
expecting condition expresion of if-expression

GHC: parse error (possibly incorrect indentation or mismatched brackets)

This demonstrates specific expectations to sub-parts of an expression.

x = f (x * )
unexpected closing parenthesis ‘)’
expecting right-hand side of ‘*’ operator

This demonstrates customizing the error message based on what the user has input (the * operator). Duet doesn't have sections.

x = f 'a b c
unexpected character: you wrote
'a b c…
but only one character is allowed inside single quotes, like this:
'a'
Perhaps you forgot to put the closing single quote?
You may also have meant to use double quotes for text, e.g.
"a b c"

GHC: Syntax error on 'a

This is particularly detailed; it shows what was written, what is expected, and what the user might have meant.

x = f (if x
          then undefined
           )
unexpected closing parenthesis ‘)’
expecting function arguments, infix operator or ‘else’ keyword for
if-expression

GHC: parse error on input ‘)’

if/else is a tricky thing to parse and retain decent error messages, but the above holds up well.

x = f (x*)
unexpected operator ‘*’, there should be spaces before and after operators.

In Duet, all operators must have a space before and afterwards. This lets us write 10 - -5 without issue.

x = f (x * -5 + 2)
unexpected operator ‘+’. When more than one operator is used
in the same expression, use parentheses, like this:
(x * -5) + ...
Or like this:
x * (-5 + ...)

GHC:

Precedence parsing error
        cannot mix ‘*’ [infixl 7] and prefix `-' [infixl 6] in the same infix expression

(This is one of GHC's good parser messages).

In Duet, there is no operator precedence. All operator combinations must be parenthesized. This makes it easier for learners to read code, and easier to edit. It makes the parser simple, but that's not a major leg-up.

In this error message, I use the inputed code to suggest two alternative ways to make their code correct.

x = f 42. x
unexpected " "
expecting decimal component, e.g. 42.0

This is a tokenization message, so is this one:

x = k -k
unexpected "k"
expecting number (e.g. 123) or space after operator ‘-’

This one is particularly good, because it tells you two ways to correct your program.

Implementation

There are a few key points in achiving a decent parser like this:

  1. Tokenize before you parse.
  2. Avoid try (also mentioned in Jasper's post, and before that Brent Yorgey's). I used try in my tokenizer on single strings e.g. then so that if you call a variable the, the keyword doesn't greedily try to take it. The parser itself doesn't use try at all.
  3. Describe everything you're parsing. That is, wrap everything in (<?>).
  4. Make custom parse conditions to handle common user errors. Parsec gives you a large degree of freedom to do this.
  5. Test BAD CODE on your tokenizer/parser. Don't just feed correct code into your parser while working. Give it garbage or "nearly right" things and use the feedback you get to guide your writing of the parser.

I think you have to do more than describe the grammar to have a good parser, it should anticipate the user's mistakes.

@sboosali
Copy link

sboosali commented May 2, 2017

@FranklinChen imo, relevant error message are too hard for machine learning. a long list of heuristics (built on "tree regexes" or something) would go a long way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment