Skip to content

Instantly share code, notes, and snippets.

@tel
Created November 20, 2016 19:47
Show Gist options
  • Save tel/982ea662350e6545c21189fee966bef5 to your computer and use it in GitHub Desktop.
Save tel/982ea662350e6545c21189fee966bef5 to your computer and use it in GitHub Desktop.
Indentation sensitive parser combinators

Indentation sensitive parser combinators

An indentation sensitive parser combinator language is one that helps you express ideas like "this parse only succeeds if it's within the current indentation block". The concept is somewhat small and elegant and is implemented in a few libraries. In this writeup, I will use examples from indents.

Background and goal

The direct goal will be to write the sameOrIndented parser combinator with a type like (but not identical to)

sameOrIndented :: Parser s u () -- aspirational, not accurate

It fails whenever the current parse is at an indentation level less than a local "reference" level which is set by withPos

withPos :: Parser s u a -> Parser s u a -- aspirational, not accurate

The idea being that you wrap a parse with withPos after parsing the indentation whitespace and the "inner" parse will have its reference set to the location where withPos was applied. It makes little sense to call sameOrIndented (or other combinators like it) outside of a withPos block but the types do not prohibit it.

Context for indentation

Indentation-sensitive parses must carry with them information about the current "reference" indentation level. To do so, we use a monad transformer with a State SourcePos layer. SourcePos is the Parsec type describing a location in the source text during parsing.

The function of withPos is to set this reference context locally. Now that we have State effects to work with this is easy, we use the Parsec monad to get the "current" location then temporarily store it in the State monad during the inner computation

withPos run = do
  here <- getPosition -- Parsec effect
  prev <- get -- State effect, stored to restore after `run`ning
  put here
  res <- run 
  put prev
  return res

Operation within the context

Once we've established an indentation reference using withPos, implementation of combinators like sameOrIndented is easy: we just get the "current position" and compare it with the "reference position". To make this work with sameOrIndented we need to determine if our column count is equal-to-or-greater-than the reference one.

sameOrIndented = do
  here <- getPosition
  ref <- get
  if (sourceColumn here) >= (sourceColumn ref)
  then return ()
  else parseFail "bad identation"

Use in practice

With these combinators and ones like them you can annotate your parsers with "indentation block begins here" markers via withPos. Parses within these blocks may need to respect indentation and therefore contain options which include whitespace but are guarded by combinators like sameOrIndented and its ilk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment