An indentation sensitive parser combinator language is one that helps you express
ideas like "this parse only succeeds if it's within the current indentation block".
The concept is somewhat small and elegant and is implemented in a few libraries. In
this writeup, I will use examples from indents
.
The direct goal will be to write the sameOrIndented
parser combinator with a type
like (but not identical to)
sameOrIndented :: Parser s u () -- aspirational, not accurate
It fails whenever the current parse is at an indentation level less than a local
"reference" level which is set by withPos
withPos :: Parser s u a -> Parser s u a -- aspirational, not accurate
The idea being that you wrap a parse with withPos
after parsing the indentation whitespace and
the "inner" parse will have its reference set to the location where withPos
was applied. It
makes little sense to call sameOrIndented
(or other combinators like it) outside of a withPos
block but the types do not prohibit it.
Indentation-sensitive parses must carry with them information about the current "reference"
indentation level. To do so, we use a monad transformer with a State SourcePos
layer.
SourcePos
is the Parsec
type describing a location in the source text during parsing.
The function of withPos
is to set this reference context locally. Now that we have State
effects
to work with this is easy, we use the Parsec
monad to get the "current" location then temporarily
store it in the State
monad during the inner computation
withPos run = do
here <- getPosition -- Parsec effect
prev <- get -- State effect, stored to restore after `run`ning
put here
res <- run
put prev
return res
Once we've established an indentation reference using withPos
, implementation of combinators
like sameOrIndented
is easy: we just get the "current position" and compare it with the "reference
position". To make this work with sameOrIndented
we need to determine if our column count is
equal-to-or-greater-than the reference one.
sameOrIndented = do
here <- getPosition
ref <- get
if (sourceColumn here) >= (sourceColumn ref)
then return ()
else parseFail "bad identation"
With these combinators and ones like them you can annotate your parsers with "indentation block
begins here" markers via withPos
. Parses within these blocks may need to respect indentation
and therefore contain options which include whitespace but are guarded by combinators like
sameOrIndented
and its ilk.