Skip to content

Instantly share code, notes, and snippets.

@CMCDragonkai
Last active July 14, 2023 19:12
Show Gist options
  • Save CMCDragonkai/7155c44ff1890564f730 to your computer and use it in GitHub Desktop.
Save CMCDragonkai/7155c44ff1890564f730 to your computer and use it in GitHub Desktop.
Haskell: Simple Predictive Parser using LL(1)

Simple Predictive Parser using LL(1)

A predictive parser is a recursive descent parser without backtracking. It relies on the concept of a lookahead symbol to determine which production rule to use. It does not support left recursion, and requires any left-recursive grammars to be converted to right recursive grammars. A predictive parser is called LL(k) where k denotes the number of lookahead symbols used. This is LL(1) because we're only using one 1 look ahead symbol equivalent to 1 character terminal in the input string. There is no lexer required, thus in this case the tokens and terminals are the exact same thing.

Strings to be parsed:

1+2-4

Left Recursive Grammar:

expr = expr + term
     | expr - term
     | e

term = 0
     | 1
     | 2
     | 3
     | 4
     | 5
     | 6
     | 7
     | 8
     | 9

Left Recursion Eliminated Grammar:

expr = term rest

rest = + term rest 
     | - term rest
     | e

term = 0
     | 1
     | 2
     | 3
     | 4
     | 5
     | 6
     | 7
     | 8
     | 9

Some goals:

  1. Turn the above into post-fix notation.
  2. Failing that, just print out whatever you're doing.
{-# LANGUAGE ScopedTypeVariables #-}
import Data.Char
import System.IO.Unsafe
import Control.Exception
-- String ~ [Token]
type Token = Char
-- helper functions
(|>) :: a -> (a -> b) -> b
(|>) f g = g f
lookAhead :: [Token] -> Maybe Token
lookAhead [] = Nothing
lookAhead (t:ts) = Just t
consume :: [Token] -> [Token]
consume [] = []
consume (t:ts) = ts
logOut :: (Show a) => a -> b -> b
logOut string expr = unsafePerformIO $ do -- unsafePerformIO extracts IO a -> a
print string
return expr
-- production functions
-- when the input is fully consumed, we return `tokens` which should be an empty list [Char] ~ ""
expr :: [Token] -> [Token]
expr tokens =
tokens |> term |> rest
-- the e production body is represented by the otherwise
rest :: [Token] -> [Token]
rest tokens = case lookAhead tokens of
Just s | s `elem` ['+', '-']
-> logOut s $ consume tokens |> term |> rest
otherwise
-> tokens
term :: [Token] -> [Token]
term tokens | Just d <- lookAhead tokens
, isDigit d
= logOut d $ consume tokens
| Just u <- lookAhead tokens
= error $ "expected digit, got something else --> " ++ show u
| Nothing <- lookAhead tokens
= tokens
-- `err :: SomeException` is mandatory, because `err` can be any type classified under `Exception` class
-- when evaluating `print err`, the compiler needs to know which instance to use
-- due to ambiguity it will not proceed without a user-supplied annotation as to the particular instance to use
-- as `err :: SomeException` is a pattern type signature, it needs the `ScopedTypeSignatures` extension
main :: IO ()
main = do
input <- getLine
catch (print $ expr input) (\(err :: SomeException) -> print err)
print "ok!"
{-
Results:
"'1'
'+'
'2'
'-'
'4'
"
"ok!"
This doesn't achieve my goal because logOut while producing a side effect is evaluating the `print` first. Even if logOut was moved
somewhere else in the pipe `|>` chain, it would still evaluate first, because function application has greater precedence and tighter
binding than the operators. No matter what the `print` runs first when using logOut. This is ok, because logOut isn't really meant to
be used this way. I was adapting the Java version of this parser from the Dragon book 2nd Edition (pg 75).
-}
{-# LANGUAGE ScopedTypeVariables, BangPatterns #-}
import Data.Char
import System.IO.Unsafe
import Control.Exception
-- String ~ [Token]
type Token = Char
-- helper functions
(|>) :: a -> (a -> b) -> b
(|>) f g = g f
lookAhead :: [Token] -> Maybe Token
lookAhead [] = Nothing
lookAhead (t:ts) = Just t
consume :: [Token] -> [Token]
consume [] = []
consume (t:ts) = ts
logOut :: (Show a) => a -> b -> b
logOut string expr = unsafePerformIO $ do -- unsafePerformIO extracts IO a -> a
print string
return expr
-- production functions
-- when the input is fully consumed, we return `tokens` which should be an empty list [Char] ~ ""
expr :: [Token] -> [Token]
expr tokens =
tokens |> term |> rest
-- the e production body is represented by the otherwise
rest :: [Token] -> [Token]
rest tokens = case lookAhead tokens of
Just s | s `elem` ['+', '-']
-> let !qwerty = consume tokens |> term -- <-- Muhahaha!
in logOut s (qwerty |> rest)
otherwise
-> tokens
term :: [Token] -> [Token]
term tokens | Just d <- lookAhead tokens
, isDigit d
= logOut d $ consume tokens
| Just u <- lookAhead tokens
= error $ "expected digit, got something else --> " ++ show u
| Nothing <- lookAhead tokens
= tokens
-- `err :: SomeException` is mandatory, because `err` can be any type classified under `Exception` class
-- when evaluating `print err`, the compiler needs to know which instance to use
-- due to ambiguity it will not proceed without a user-supplied annotation as to the particular instance to use
-- as `err :: SomeException` is a pattern type signature, it needs the `ScopedTypeSignatures` extension
main :: IO ()
main = do
input <- getLine
catch (print $ expr input) (\(err :: SomeException) -> print err)
print "ok!"
{-
Results:
"'1'
'2'
'+'
'3'
'-'
"
"ok!"
-}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment