Created
March 15, 2012 01:00
-
-
Save sw17ch/2040850 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This example depends on parsec >= 3.0 and the indents package >= 0.3.2. | |
To install indents, run the following command: | |
cabal install indents | |
> module Main where | |
> import Text.Parsec | |
> import Text.Parsec.Indent | |
> import qualified Control.Monad.State as S | |
First, lets define the input data we want to use. In this case, we want to | |
specify the name of the list by aligning it with the left-most column and | |
ending it with a colon. It is allowed to include numbers and letters. | |
In this example, we have two lists defined. | |
> test_input :: String | |
> test_input = unlines [ | |
> "foo:", | |
> " bar", | |
> " baz", | |
> " boop", | |
> "", | |
> "foop:", | |
> " 123", | |
> " 45234kdskfjs34", | |
> " 9999999" | |
> ] | |
Rather than using raw strings to define the two different types of fields as | |
type synonyms. In this case, we have Names and Items which are both | |
represented as strings. | |
> type Name = String | |
> type Item = String | |
Finally, we define our goal type. We're calling it a named list. A named list | |
has a name and a list of items in the list. | |
> data NamedList = NamedList Name [Item] | |
> deriving (Show) | |
Here's where things get a little tricky. The Parser type synonym built into | |
Text.Parsec assumes a few things: | |
1) Assumes that the input stream is of type String. | |
2) Assumes that the user state is (). | |
3) Assumes that the underlying monad is the Identity monad. | |
The only one of these assumptions that's a problem for the default layout given | |
by Text.Parsec is that the indentation parser assumes a different underlying | |
monad. Instead, it uses (Control.Monad.State SourcePos) as the underlying | |
monad. | |
> type IParser a = ParsecT String () (S.State SourcePos) a | |
Since we've had to redefine Parser because of the change in the underlying monad, | |
we'll also redefine the 'parse' conenience function to handle our new type. | |
Our definition handles both the new underlying monad and the extra step needed | |
by the indentation parser. | |
> iParse :: IParser a -> String -> String -> Either ParseError a | |
> iParse parser file_name input = runIndent file_name $ runParserT parser () file_name input | |
Now, we need to define what we mean by 'a Name'. In this case, we want a name | |
to be defined by a non-empy string of alpha-numeric characters followed by a | |
colon. | |
> aName :: IParser String | |
> aName = do | |
> n <- many1 alphaNum | |
> _ <- char ':' | |
> spaces | |
> return n | |
An Item is defined similarly to a name, but we don't expect the trailing colon. | |
> anItem :: IParser String | |
> anItem = do | |
> i <- many1 alphaNum | |
> spaces | |
> return i | |
In this parser, we actually attempt to parse an indented block. In this case, | |
we're using the 'withBlock' construct. The 'withBlock' construct attempts to | |
parse a list name followed by a block of items in the list. The two items are | |
combined with a function. In this case, we pass 'NamedList' as the function | |
with which to combine the name and the items. | |
> aNamedList :: IParser NamedList | |
> aNamedList = do | |
> b <- withBlock NamedList aName anItem | |
> spaces | |
> return b | |
We might want to parse more than one list out of the same stream, so that's | |
exactly what we do here. We clear any preceeding spaces out of the input | |
stream and then attempt to parse at least one named list. | |
> someNamedLists :: IParser [NamedList] | |
> someNamedLists = do | |
> spaces | |
> lists <- many1 aNamedList | |
> return lists | |
Finally, we invoke iParse on our parser, a dummy file name, and the test input | |
stream. If we get a valid result, we'll just print the Haskell representation | |
of the NamedList type. If we get a parse error, we'll print the error instead. | |
> main :: IO () | |
> main = do | |
> let res = iParse someNamedLists "some_source.txt" test_input | |
> | |
> putStrLn "Indentation Parser Example" | |
> case res of | |
> Left err -> putStrLn $ "Error: " ++ show err | |
> Right v -> print v |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment