An "invertible syntax description" is something that can be used to both parse and generate a syntax.
For example, instead of defining separate toJson
and fromJson
functions for a given data type, an invertible syntax description could be used to provide both.
There are many Haskell libraries that provide this functionality or something close to it.
As far as I can tell most of them are at least inspired by Invertible syntax descriptions by Tillmann Rendel and Klaus Ostermann.
Personally I am interested in using these for HTTP routing.
I frequently want to be able to define a route such as /episodes/:id.json
, dispatch based on that route, and generate links to that route.
Doing so manually is tedious and error prone.
Talking with a friend at Flipstone unveiled their text-routing
project, which does exactly that.
I am considering adopting their project at ITProTV, but I wanted to survey the options on Hackage first.
That's what this document is for.
There are two features that are absolutely critical for me:
-
Being able to have "holes" in the parse result. For example, one way to think about routing is a function that takes a path and returns a handler (that is,
String -> Handler
). That's howscotty
works, for example. I'm not satisfied with that because a route like/episodes/:id.json
should really require an episode ID before returning the handler (String -> (EpisodeId -> Handler)
). Doing that type of thing with normal routing means you have to be sure to match the identifier in your route (:id
) with the identifier in your handler. -
Being able to carry around data on the side when parsing and generating. For example
/episodes/:id.json
isn't the entire location of a resource because you also need the HTTP method. When matching on a route I'll need to make sure the method matches in addition to matching the path. And when generating a route I'll need to produce a method along with the path. I would like to do this without resorting to in band signaling like"GET /episodes/123.json
.
As I survey these libraries, here are the things I'm looking for:
- Builds with GHC 8.8.
- Proven support for "real world" formats like CSV, JSON, BSON, and of course web routes.
- Well maintained, responsive to issues and pull requests, with source available somewhere nice.
- Well tested, either by lots of users or a robust test suite.
- Well documented, both at the API level and the package level with worked examples.
- Small core that's easy to understand and extend, like
attoparsec
. - Avoids crazy extensions and operators.
- Avoids having a lot of dependencies, and avoids "big" dependencies like
lens
. - Avoids leaning too heavily on generic deriving.
- Avoids Template Haskell entirely, except maybe as an optional feature.
- Avoid arrows.
And here are things that I don't care about too much:
- Faster is better, but ultimately I'm looking for "not slow".
- Things should be easy to reason about, but I care more about utility that theoretical purity.
- Being available on Stackage would be nice, but I can do this myself.
After surveying all the options, hopefully I'll find something that suites my needs. If not, I'm comfortable making my own, at which point this document can be used to motivate its existence.
Generally speaking this document is meant to be read top to bottom. It's roughly a stream of consciousness, although it was compiled over several days. And I've made edits here and there.
invertible-syntax
Zwaluw
boomerang
web-inv-route
text-routing
JsonGrammar
unjson
tomland
codec
roundtrip
syntax
Right out of the gate I'm surprised that there are so many libraries.
Before researching this I was only aware of boomerang
and tomland
.
If this is such a powerful and universal concept, why are all these libraries reinventing the wheel?
I suppose things aren't much better for parsing (parsec
, Earley
, trifecta
, ...) and pretty printing (wl-pprint
, pretty
, prettyprinter
, ...) separately, so there's no reason why both of them together would be any better.
Really the most concerning thing is that the language specific libraries like JsonGrammer
don't reuse one of the lower level libraries like codec
.
Are there problems with the low level libraries, or were the authors of the high levels libraries simply unaware of them?
Hopefully I'll find out.
I will reference "the paper" a lot, which means Invertible syntax descriptions. I'd recommend at least reading the abstract to get a feel for what it's about.
This package's author is one of the authors of the paper.
It seems like this is a more or less direct translation of the paper's code into a package.
The package depends on partial-isomorphisms
, which defines the Iso
type from the paper along with some helpful functions.
What follows is essentially an overview of the paper through the lens of this package's documentation and exported identifiers.
At its core, this defines a couple data types.
The parser is pretty typical: Parser (String -> [(alpha, String)])
, like ReadS
.
The pretty printer is a little stranger: Printer (alpha -> Maybe String)
.
How can printing something fail?
I've found it illustrative to consider a parser that consumes exactly two characters of input, like a country code (US
, UK
, FR
, ...).
The corresponding pretty printer should fail if it somehow produces any amount of output other than two characters, like USA
.
Anyway, both the Parser
and Printer
implement the Syntax
type class, which has three super classes: IsoFunctor
, ProductFunctor
, and Alternative
.
It also defines a couple methods, but we'll get back to that.
IsoFunctor
is a type class for partial isomorphisms.
An isomorphism is a lossless conversion between two types, like Maybe ()
and Bool
.
(Because they have the same number of values, Maybe ()
and Bool
can be converted between each other without losing any information.)
A partial isomorphism is a lossy conversion.
For example, consider String
and Rational
.
You could parse a String
as a Rational
, like "1.2"
as 12 % 10
, but that conversion can clearly fail when given values like "twelve"
.
And you could represent a Rational
as a String
by going the other way, but some numbers can't be represented, like 1 % 3
.
Unsurprisingly the IsoFunctor
type class is a lot like Functor
.
Given an isomorphism between two types (Iso a b
), an IsoFunctor
lets your map over a value of one type to produce the other.
More concretely, (<$>) :: IsoFunctor f => Iso a b -> f a -> f b
.
This type class is a necessary addition over the existing Functor
class because Parser
is covariant but Printer
is contravariant.
mapParser :: (a -> b) -> Parser a -> Parser b -- covariant
mapPrinter :: (b -> a) -> Printer b -> Printer a -- contravariant
I've never been good with variance.
To put it in terms I can understand, Parser
is covariant because it eventually produces and a
;
Printer
is contravariant because it consumes an a
and produces something else.
An IsoFunctor
could be one or the other or both.
Moving on, ProductFunctor
is similar to Applicative
.
This is easier to grok because it doesn't depend on the whole partial isomorphism thing.
Instead it's merely used to produce products, which in Haskell are tuples.
Hence (<*>) :: ProductFunctor f => f a -> f b -> f (a, b)
.
Compare that to Applicative
's (<*>)
, which has type Applicative f => f (a -> b) -> f a -> f b
.
Up next is Alternative
, which is just like the one in base
except it doesn't have any super class constraints.
As a refresher:
class Alternative f where
empty :: f a
(<|>) :: f a -> f a -> f a
Finally we're ready to tie those three classes together in the Syntax
type class.
It introduces two new methods: pure
and token
.
Both are easy to understand in the context of parsers and printers.
In a parser, pure x
will produce x
without consuming input.
In a printer, pure x
will do produce an empty string for values equal to x
and otherwise produce nothing.
(That's kind of confusing. In code: if actual == expected then Just "" else Nothing
.)
In a parser, token x
will consume exactly x
, which is probably a Char
in typical use cases.
In a printer, token x
will produce exactly x
.
And that's ... it. With these basic building blocks, the package produces a rudimentary parser/printer. It provides some combinators, but that's about it. It feels kind of unsatisfying after walking through that paper.
Practically speaking this seems like it could be a good foundation for a more robust library, but at that point I'm not sure why you wouldn't re-implement these type classes yourself. Sure, you'd save yourself a little work, but you'd be taking on this apparently unmaintained dependency. In short: Great idea, solid implementation, but not really suitable for use as a library. At least it gives us a good foundation for the rest of the packages we're surveying.
For starters, next to no documentation on the package, in the one exposed module, or on GitHub. Latest commit was more than eight years ago. So probably not the greatest package from a software engineering point of view. Regardless, let's dive in.
This package is focused on web routes specifically.
I think it's the basis for the more recent boomerang
package, so it probably has some good ideas.
The main module it exposes is surprisingly short.
It exports two types: Router a b
and a :- b
.
I think Router a b
is an invertible parser from String
to b
.
Not sure why a
is there.
Maybe for the Category
or Monoid
instance?
And I think a :- b
is for building "functions" out of parsers.
Or, to put it differently, it lets you use constructors in Router
s.
There aren't any examples in the documentation, but luckily there are a few in the source repository. Unfortunately the examples use Template Haskell, which wasn't mentioned in the documentation at all. Seems like the repo probably drifted from Hackage a bit. Since I'm trying to avoid TH, I'm not going to dig too deep into this library.
Let's quickly break down an example. They start with an ADT for all the routes, which I like.
data Sitemap
= Home
| UserOverview
| UserDetail Int
| Article Int String
Then comes a TH splice, a type family instance, and a bunch of top level declarations run together. This is gross to me.
$(deriveAll ''Sitemap "PF_Sitemap")
type instance PF Sitemap = PF_Sitemap
Z rHome :& Z rUserOverview :& Z rUserDetail :& Z rArticle = mkRouters
It's not clear to me what "PF" means.
I also don't understand why this isn't all done with a single TH call.
Why split it up?
Every time you add a new route you'll have to change the mkRouters
line.
Anyway, moving on.
The sitemap itself uses (/)
for path separators, which is cute.
Also (<>)
for combining different routes, which makes sense.
And (.)
from Control.Category
for building up a single route, which is alright but kind of annoying that you can't use (.)
from the Prelude
.
sitemap :: Router r (Sitemap :- r)
sitemap = id /
( rHome
<> "users" . users
<> rArticle . ("article" / int . "-" . part)
)
where
users = rUserOverview
<> rUserDetail / int
Looks like constructors take tuples, which is why the rArticle
line uses parentheses like that.
Overall the end result looks relatively clean, but it doesn't seem worth the high entry fee of Template Haskell.
The Happstack documentation says:
Sjoerd Visscher and Martijn van Steenbergen figured out exactly how to do that and published a proof of concept library know as Zwaluw. With permission, I have refactored their original library into two separate libraries: boomerang and web-routes-boomerang.
This package seems to have documentation and was updated less than two weeks ago, so maybe it's the "production ready" version of Zwaluw? Let's dive in.
Looks like Boomerang doesn't require type families and only uses a single Template Haskell splice for the sitemap data type, so that's an improvement over Zwaluw. Constructing data also appears to not require tuples, so that's nice. (It means fewer parens in the route definition.)
Functionally that seems to be about it. There's a lot of documentation and a lot of helpful combinators too, which is nice. I don't see any examples of how to do things without TH though. I could dig through the code to see what the splices generate, but I don't think that's worth it.
TODO seems similar to boomerang but hyper focused on http routing to the point of building a map from the parser probably not useful in other contexts
TODO also very specific to web routing not on hackage yet hopefully simpler than alternatives and not tightly coupled to happstack
TODO focused on json for some reason also involves type script might have some neat stuff under the covers but none of it is exposed
TODO
also focused on json
seems to have some cool ideas about documentation
interacts well with the ecosystem (aeson
, attoparsec
, pretty
)
TODO focused on toml recent and well documented would be nice to dig into this and see what kinds of problems they ran into (if any) neat ideas: functions instead of type classes and two phase approach (text <=> ast <=> data) also tagged partial bidirectional isomorphism and monadic profunctors https://kowainik.github.io/posts/2019-01-14-tomland
TODO seems like an entirely different approach allows for monadic parsers rather than merely applicative ones https://www.microsoft.com/en-us/research/publication/functional-pearl-pickler-combinators/ https://blog.poisson.chat/posts/2016-10-12-bidirectional-serialization.html https://blog.poisson.chat/posts/2017-01-01-monadic-profunctors.html
TODO looks very similar to the paper but more robust and "production ready" with json and xml libraries
TODO looks like a big improvement over the paper but requires arrows