Every year Santa ๐ brings joy (and presents ๐) to all the kids in the world. Well, not when I was a child! Following the usual Spanish traditions, the Three Reyes Magos brought me presents in their camels ๐ซ following the Star ๐ซ Nowadays, it's Sinterklaas who puts pepernoten in my shoes ๐. Because in fact there's not one, but a group of present-bringing people, distributed across the globe ๐, each with their own part of the world to cover (or you really thought just one man and eight reindeers could do it all by their own?)
In order to communicate, they need a way to exchange information about presents. Since they are all very wise, they've agreed to use Haskell, so this information is represented using algebraic data types. Here are some types related to building blocks:
data Block = Block BlockBrand String Color
data BlockBrand = Lego |ย KNex |ย Kappla | ...
data Color = Blue |ย Red | Yellow | ...
Given that they have thousands of kinds of toys and huge enumerations, they would prefer not to define serialization manually. "It's almost Christmas, just use JSON and call it a day", I hear you thinking. They cannot, because otherwise the Grinch would try to steal everything! They follow their very own protocol. Could we maybe help them to do this once and for all?
In order to write a function, you often first write out its type. In our case we are going to represent the serialization as a Builder
for ByteStrings
. We need our messages to arrive fast, and this is one of the best ways to generate byte-data in Haskell!
import Data.ByteString.Builder (Builder)
import qualified Data.ByteString.Builder as B
Every such generic operation is defined by means of a type class. Why is this necessary? Well, serialization cannot be defined for every type -- think of how you would serialize functions -- and classes is how you carve out a subset of types.
class Santinize a where
santinize :: a -> Builder
We can give a few instances for basic types. In order to distract the Grinch, our protocol is going to introduce a lot of (proper) noise in the encoding.
instance Santinize Int where
santinize x = B.shortByteString "UU" <> B.intDec x <> B.shortByteString "UU"
instance Santinize String where
santinize x = B.shortByteString "YOYO" <> B.stringUtf8 x
(Data type) generic programming is a technique for writing functions which operate on many data types, in which we are aware of the structure. In other words, you can express things such as "do the following over each field in a record". Haskell is one of the languages with the best support for this kind of programming, due to a combination of built-in derivation of the Generic
class and the type class system. In this post we are going to use the generics-sop
library, which provides one of the simplest interfaces to this functionality:
import Generics.SOP
The key idea is to realize that regular data types in Haskell (that is, those which you can write without additional extensions such as GADTs
) always follow the same pattern:
- A choice of constructor,
- With a sequence of fields,
- Each containing a single value.
In case of a record we only have one constructor, yet still still choose it in step (1). In case of proper enumerations -- like the type BlockBrand
above -- the sequence of fields in (2) is empty. generics-sop
provides a way to write functions by representing those steps as their own data types:
- First a newtype
SOP
which just marks that we are moving into the world of data type generic programming, - Choice is represented by
NS
; theS
comes from "sum", as we often call things like enumerations "sum types", - The sequence of fields is represented by
NP
; here theP
comes from "product type", another name for records, - Finally, each single value is wrapped in a newtype called
I
.
Armed with this new knowledge, let's write the generic function -- or in fact three of them. gsantinize
is the top-level one which simply removes the newtype layer.
gsantinize :: All2 Santinize r => SOP I r -> Builder
gsantinize (SOP x) = gsantinizeS x
gsantinizeS
figures out which constructor has been taken. This is done in a quite clever way: imagine you lay out all the constructor in the same order they've been defined. You start at the first one: if you choose it, you indicate so using the Z
constructor. If you prefer to move one constructor further, you use S
. For example, KNex
from BlockBrand
above would be represented as S (Z ...)
, since it's the second constructor: move one, then choose.
gsantinizeS
:: All2 Santinize choices
=> NS (NP I) choices -> Builder
gsantinizeS (Z x) = B.shortByteString "NO" <> gsantinizeP x
gsantinizeS (S x) = B.shortByteString "NI" <> gsantinizeS x
Notice that whereas the S
case in gsantinizeS
calls itself recursively, the Z
case moves to gsantinizeP
, which takes care of steps (2) and (3): iterating over the sequence of fields and doing something with each value. Due to the typing mechanisms in the library, the regular list type cannot be used, instead we have Nil
for the empty sequence and (:*)
for adjoining an element at the front -- what we usually call []
and (:)
.
gsantinizeP
:: All Santinize fields
=> NP I fields -> Builder
gsantinizeP Nil = mempty
gsantinizeP (I x :* xs) = santinize x <> gsantinizeP xs
There's something on those definitions which you are surely wondering about: what is All2
and All
? This is a way to constrain when we can apply these generic operations. In particular, at the very last step in gsantinizeP
, we called santinize
to write down the value of that field. That means the we can only use gsantinize
if every field in the data type implements the Santinize
class. This is encoded by All
at the product level, and All2
at the sum level -- the 2
indicates "two levels deep", since we need that all constructors are such that all fields implement Santinize
.
Simply writing gsantinize
does not make it available for usage in every data type which may support it. Instead, using a generic function is opt-in in GHC. To use it, you have two follow a couple of steps.
First you need to ask GHC and generics-sop
to generate the code which turns your data type into the representation used by the latter. There's a lot of magic โ๏ธ involved here (we first ask the compiler to derive another representation called GHC.Generic
, which is then turned into the generics-sop
one), but the nice thing is we don't have to care ๐คท, just include a few more elements in the deriving
clause.
{-# language DeriveGeneric, DeriveAnyClass #-}
import qualified GHC.Generics as GHC
data Block = Block BlockBrand String Color
deriving (GHC.Generics, Generic)
Now every instance is as simple as converting to the representation used by generics-sop
-- this is done via the aptly-called from
-- and then use gsantinize
.
instance Santinize Block where
santinize = gsantinize . from
You can even go one step further and tell GHC to use this implementation as default whenever the data type supports the corresponding constraints. To do so you need to rework the original Santinize
type class.
{-# language DefaultSignatures #-}
class Santinize a where
santinize :: a -> Builder
default santinize
:: (Generic a, All2 Santinize (Code a))
=> a -> Builder
santinize = gsantinize . from
The last four lines state: here's a default definition of santinize
which can be used whenever:
- We can convert it to the representation used by
generic-sop
. This is indicated by theGeneric
constraint. The same one we derived automatically one minute ago! - And then the constraint
All2 Santinize
we carry on fromgsantinize
.
With this additional set-up we can even make Santinize
part of the deriving
clause!
data Block = Block BlockBrand String Color
deriving (GHC.Generics, Generic, Santinize)
Hooray! ๐
If you liked this post, and you are interested not only in fancy type level programming in Haskell, but also more pragmatic material, check my book Practical Haskell. I also happen to have written a book with more than you ever wanted to know about monads. Ah! and at 47 Degrees Academy we are happy to offer you trainings about Haskell and many other functional languages.