Inspired by "Parsing CSS with Parsec".
Just quick notes and code that you can play with in REPL.
By @kachayev
What do we have?
.container h1 {
color: rgba(255, 0, 0, 0.9);
font-size: 24px;
font-family: Monaco;
}
What do we want to get?
{:selector ".container h1",
:rules
({:key "color", :value "rgba(255, 0, 0, 0.9)"}
{:key "font-size", :value "24px"}
{:key "font-family", :value "Monaco"})}
Let's do this with Monadic parsing approach. Step-by-step explanation of how it works one can find in Monadic Parsing in Python.
What about Clojure?
From theory:
type Parser a = String -> [(a, String)]
So, we're going to represent parser
as simple function that takes input and returns seq of possible results. Simplest parsers:
;; takes any input and "consume" first char from it
(defn any [input]
(if (empty? input) '()
(list [(first input)
(apply str (rest input))])))
;; this one doesn't accept any input
(defn failure [_] '())
(any "clojure-1.7") ;; => ([c lojure-1.7])
We also will use few helpers:
(defn parse [parser input]
(parser input))
(defn parse-all [parser input]
(->> input
(parse parser)
(filter #(= "" (second %)))
ffirst))
I'm not going to pay much attention to what monads
are. Everything that you need to know is that we will use it to compose small parsers into bigger one. Also we're not going to implement entire "monad infrastructure".
;; builds parser that always returns given element without consuming (changing) input
(defn return [v]
(fn [input] (list [v input])))
;; takes parser and function that builds new parsers from (each) result of applying first one
(defn >>= [m f]
(fn [input]
(->> input
(parse m)
(mapcat (fn [[v tail]] (parse (f v) tail)))))
Next part is a simple macro that provides haskell-like do
notation. If you don't know how do
block works in Haskell, just skip this code and return to it when necessary.
(defn merge-bind [body bind]
(if (and (not= clojure.lang.Symbol (type bind))
(= 3 (count bind))
(= '<- (second bind)))
`(>>= ~(last bind) (fn [~(first bind)] ~body))
`(>>= ~bind (fn [~'_] ~body))))
(defmacro do* [& forms]
(reduce merge-bind (last forms) (reverse (butlast forms))))
Let's build basic parsers that can recognize chars, strings, letters, spaces, etc.
(defn sat [pred]
(>>= any (fn [v] (if (pred v) (return v) failure))))
;; just a helper
(defn char-cmp [f]
(fn [c] (sat (partial f (first c)))))
;; recognizes given char
(def match (char-cmp =))
;; rejects given char
(def noneOf (char-cmp not=))
;; just a helper
(defn from-re [re]
(sat (fn [v] (not (nil? (re-find re (str v)))))))
;; recognizes any digit
(def digit (from-re #"[0-9]"))
;; recognizes any letter
(def letter (from-re #"[a-zA-Z]"))
Lets try:
(parse (match "c") "clojure1.7")
;; => ([\c "lojure1.7"])
(parse letter "clojure1.7")
;; => ([\c "lojure1.7"])
(parse digit "1.7clojure")
;; => ([\1 ".7clojure"])
Ok, so far so good. Moving forward...
Useful combinators that will help us:
;; (ab)
(defn and-then [p1 p2]
(do*
(r1 <- p1)
(r2 <- p2)
;; xxx: note, that it's dirty hack to use STR to concat outputs
;; Full functional implementation should use MonadPlus protocol
(return (str r1 r2))))
;; (a|b)
(defn or-else [p1 p2]
(fn [input]
(lazy-cat (parse p1 input) (parse p2 input))))
(declare plus)
(declare optional)
;; (a*)
(defn many [parser] (optional (plus parser)))
;; (a+) equals to (aa*)
(defn plus [parser]
(do*
(a <- parser)
(as <- (many parser))
(return (cons a as))))
;; (a?)
(defn optional [parser] (or-else parser (return "")))
Pay special attention to plus
combinator implementation using do*
mentioned above. do*
is only syntax sugar for bind
operations, in reality plus
looks like:
(defn plus [parser]
(>>= parser
(fn [a] (>>= (many parser)
(fn [as] (return (cons a as)))))))
Example of combinators usage:
;; recognizes space (or newline)
(def space (or-else (match " ") (match "\n")))
;; recognizes empty string or arbitrary number of spaces
(def spaces (many space))
;; recognizes given string, i.e. "clojure"
(defn string [s] (reduce and-then (map #(match (str %)) s)))
Lets try to build something more complicated and interesting:
(def clojure-version (do*
(string "clojure")
(match " ")
(major <- digit)
(match ".")
(minor <- digit)
(return (str "major: " major "; minor: " minor))))
(parse-all clojure-version "clojure 1.7")
;; => "major: 1; minor: 7"
From CSS 2.1 grammar definition:
type Selector = String
data Rule = Rule String String deriving Show
data Ruleset = Ruleset Selector [Rule] deriving Show
In Clojure we can use records to represent data types:
(defrecord Rule [key value])
(defrecord Ruleset [selector rules])
Now lets define rule
and ruleset
parsers:
(def letter+ (or-else letter (match "-")))
(def rule (do*
(p <- (many (noneOf ":")))
(match ":")
spaces
(v <- (many (noneOf ";")))
(match ";")
spaces
(return (Rule. (apply str p) (apply str v)))))
(def ruleset (do*
(s <- (plus (noneOf "{")))
(match "{")
spaces
(r <- (plus rule))
spaces
(match "}")
spaces
(return (Ruleset. (trim (apply str s)) r))))
Play a bit:
(parse-all rule "background: #fafafa; ")
;; {:key "background", :value "#fafafa"}
(parse-all ruleset "p { background: #fafafa; color: red; }")
;; {:selector "p",
;; :rules
;; ({:key "background", :value "#fafafa"} {:key "color", :value "red"})}
(def css ".container h1 {
color: rgba(255, 0, 0, 0.9);
font-size: 24px;
font-family: Monaco;
}")
(parse-all ruleset css)
;; {:selector ".container h1",
;; :rules
;; ({:key "color", :value "rgba(255, 0, 0, 0.9)"}
;; {:key "font-size", :value "24px"}
;; {:key "font-family", :value "Monaco"})}
Looks nice.
-
it's easier to work both with monads and parsers in dynamically typed language
-
it's not that convenient to work with
string
s andlist
ofchar
s (you can see few "irrational"apply str _
andfirst _
) -
there is still room for improvement, i.e. more general
return
,bind
anddo*
-
you can try to rewrite parsers using applicative style