Created
November 3, 2014 19:54
-
-
Save tel/df3fa3df530f593646a0 to your computer and use it in GitHub Desktop.
Monad transformers and parser combinators
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{-# LANGUAGE ConstraintKinds #-} | |
{-# LANGUAGE GeneralizedNewtypeDeriving #-} | |
module ParserCombinators where | |
{- | |
We'll build a set of parser combinators from scratch demonstrating how | |
they arise as a monad transformer stack. Actually, how they arise as a | |
choice between two different monad transformer stacks! | |
-} | |
-------------------------------------------------------------------------------- | |
import Control.Applicative | |
import Control.Monad | |
import Control.Monad.State | |
{- | |
Parser combinators are computations which, behind the scenes, consume | |
a stream of inbound characters producing /parsed partial results/ or | |
failing due to failing to match the grammar. | |
We'll describe the most basic interface in a typeclass. A parser | |
combinator type must be | |
1. a 'Functor' where 'fmap' changes the partial result without any side effect, | |
2. a 'Monad' and an 'Applicative' where '(>>=)'/'(<*>)' indicates /sequencing/, | |
3. an 'Alternative' where '(<|>)' indicates (left-biased) choice, and finally | |
4. an instance of 'Parses' defined below which gives us the most basic parser | |
-} | |
class Parses p where | |
satisfy :: (Char -> Bool) -> p Char | |
-- We'll use ConstraintKinds to describe this overall set of | |
-- constraints. | |
type IsParser f | |
= ( Parses f | |
, Functor f | |
, Applicative f | |
, Alternative f | |
, MonadPlus f -- this is just for completeness, MonadPlus and | |
-- Alternative are the same thing! | |
, Monad f | |
) | |
-- And now we already can begin to write parser combinators, although, | |
-- honestly, each of these types is *far* too restrictive. | |
char :: Parses p => Char -> p Char | |
char c = satisfy (== c) | |
parensM :: (Parses m, Monad m) => m b -> m b | |
parensM p = do | |
char '(' | |
res <- p | |
char ')' | |
return res | |
many1M :: MonadPlus p => p a -> p [a] | |
many1M p = do | |
a <- p | |
as <- manyM p | |
return (a:as) | |
manyM :: MonadPlus p => p a -> p [a] | |
manyM p = many1M p `mplus` return [] | |
option :: Alternative p => p a -> p (Maybe a) | |
option p = fmap Just p <|> pure Nothing | |
choice :: Alternative p => [p a] -> p a | |
choice = foldr (<|>) empty | |
-- The @XM@ names are used to emphasize that these instances are | |
-- needlessly monadic. | |
parens :: (Parses p, Alternative p) => p b -> p b | |
parens p = (\_ a _ -> a) <$> char '(' <*> p <*> char ')' | |
many1 :: Alternative p => p a -> p [a] | |
many1 p = liftA2 (:) p (many p <|> pure []) | |
-- We'll eventually design our parser implementations to satisfy all | |
-- of 'IsParser', so since all of these combinators demand only | |
-- subsets of the constraints in 'IsParser' we'll ultimately be able | |
-- to use them all. | |
sepBy :: Alternative f => f a -> f s -> f [a] | |
sepBy p s = liftA2 (:) p ((s *> sepBy1 p s) <|> pure []) <|> pure [] | |
sepBy1 :: Alternative f => f a -> f s -> f [a] | |
sepBy1 p s = scan where scan = liftA2 (:) p ((s *> scan) <|> pure []) | |
-- Also note that these are exactly the definitions used in real | |
-- parser combinator libraries like Attoparsec. | |
-- Because our parser combinator basis does not actually demand a | |
-- specific implementation of the parser type, we're also free to, for | |
-- instance, parse context sensitive grammars by stacking on | |
-- constraints like 'MonadState' | |
type IsContextSensitive s p = ( IsParser p, MonadState s p ) | |
-- So let's implement this now. | |
-------------------------------------------------------------------------------- | |
{- | |
Above we implemented a basis from which we can operate a | |
parser. Anything which satisfies the entire 'IsParser' interface | |
probably isn't lying. Let's examine two of these and see how they | |
arise as monad transformers. | |
The most basic parser is one which intakes an imput string and returns | |
'Maybe' the parsed interpretation of that string. That would nearly be | |
a perfect specification, except it doesn't talk about how to handle | |
"leftovers" if you only need to use part of the input to determine | |
whether or not this is a valid parse. | |
Instead of throwing leftovers away, we'll keep them around. | |
-} | |
runParser1 :: Parser1 a -> (String -> Maybe (a, String)) | |
runParser1 (Parser1 go) inp = go inp | |
newtype Parser1 a = Parser1 (String -> Maybe (a, String)) | |
{- | |
We can see that Parser1 implements the entire interface we need. I'll | |
just write that out, but we'll exmaine how it works in a little | |
bit. For now, just skip the following code and take it as material | |
proof that Parser1 is a sufficient design. | |
-} | |
instance Functor Parser1 where | |
fmap f (Parser1 go) = Parser1 $ \inp -> do | |
-- using the Maybe monad here | |
(a, outp) <- go inp | |
return (f a, outp) | |
instance Applicative Parser1 where | |
pure = return | |
p1 <*> p2 = do | |
f <- p1 | |
x <- p2 | |
return (f x) | |
instance Alternative Parser1 where | |
empty = Parser1 (\_ -> Nothing) | |
p1 <|> p2 = Parser1 $ \inp -> | |
case runParser1 p1 inp of | |
Nothing -> runParser1 p2 inp | |
Just x -> Just x | |
instance Monad Parser1 where | |
-- passes the input string straight through to the output | |
return a = Parser1 (\inp -> return (a, inp)) | |
p >>= f = Parser1 $ \inp -> do | |
(a, outp1) <- runParser1 p inp | |
(b, outp2) <- runParser1 (f a) outp1 | |
return (b, outp2) | |
instance Parses Parser1 where | |
satisfy pred = Parser1 $ \inp -> | |
case inp of | |
[] -> Nothing | |
c : cs | pred c -> Just (c, cs) | |
| otherwise -> Nothing | |
{- | |
Phew! | |
So why does all of the code above work? Why should we believe that we | |
could ever implement this stuff? | |
Because if we look carefully at 'Parser1' we see that it's \"just\" | |
the same as 'Parser2' | |
-} | |
newtype Parser2 a = | |
Parser2 (StateT String Maybe a) | |
deriving ( Functor, Monad, Applicative, Alternative ) | |
runParser2 :: Parser2 a -> String -> Maybe (a, String) | |
runParser2 (Parser2 go) inp = runStateT go inp | |
instance Parses Parser2 where | |
satisfy pred = Parser2 $ do | |
inp <- get | |
case inp of | |
[] -> fail "empty input" | |
c : cs | pred c -> put cs >> return c | |
| otherwise -> fail "satisfy" | |
{- | |
In other words, if we recognize that 'Parser1' is a monad transformer | |
stack of 'State' and 'Maybe' then all of the interfaces \"write | |
themselves\". In fact, if you go above and examine the interfaces | |
against the code for the implementation of 'State' then you'll see a | |
lot of similarities. | |
Which leads to the question: why are parsers equal to a combination of | |
'State' and 'Maybe' (failure)? Well, that seems to be a reasonable | |
description of what parsers do---we simply must restrict our notion of | |
state as being /causal/ in that we try to handle the present state | |
along a stream and throw it away when we're done: this is not at all | |
unlike @MonadState (Stream a)@. | |
This might make one wonder if we could change out the | |
components. And we can! | |
For instance, a /backtracking/ monadic parser combinator | |
implementation which satisfies all of the prior interface is just the | |
transformer stack of 'State' and '[]'. | |
-} | |
newtype Parser3 a = | |
Parser3 { runParser3 :: StateT String [] a } | |
deriving ( Functor, Monad, Applicative, Alternative ) | |
-- Note that this is literally a copy-and-paste job from | |
-- above... That's because this implementation works for /any/ | |
-- underlying monadic layer which has a notion of failure. | |
instance Parses Parser3 where | |
satisfy pred = Parser3 $ do | |
inp <- get | |
case inp of | |
[] -> fail "empty input" | |
c : cs | pred c -> put cs >> return c | |
| otherwise -> fail "satisfy" | |
{- | |
Here, we see that the @runX@ function denotes the idea that this | |
parser succeeds 0-or-more times just like we'd expect from a | |
non-deterministic, backtracking parse. | |
The list monad embodies depth-first search by default. What if we want | |
something breadth-first? Or if we want to ensure fairness in context | |
of an infinite space of potential parses? | |
We could rip out the '[]' layer and replace it with a monad from the | |
@logict@ or @omega@ packages, but instead let's just take advantage of | |
the idea that all we need to actually rely on is the /failure effect/: | |
-} | |
newtype Parser4 m a = | |
Parser4 { runParser4 :: StateT String m a } | |
deriving ( Functor, Monad, Applicative, Alternative ) | |
instance FailWith m => Parses (Parser4 m) where | |
satisfy pred = Parser4 $ do | |
inp <- get | |
case inp of | |
[] -> failWith "empty input" | |
c : cs | pred c -> put cs >> return c | |
| otherwise -> failWith "satisfy" | |
-- FailWith just let's us be a bit more specific about why we're | |
-- failing. The 'fail' implementation built into 'Monad' is widely | |
-- considered to be a mistake since it often has really bad default | |
-- implementations---not all monads actually have a notion of failure! | |
class Monad m => FailWith m where | |
failWith :: String -> m a | |
instance FailWith Maybe where | |
failWith _ = Nothing | |
instance FailWith [] where | |
failWith _ = [] | |
instance FailWith m => FailWith (StateT s m) where | |
failWith reason = lift (failWith reason) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment