Talking about Toys

Every year Santa 🎅 brings joy (and presents 🎁) to all the kids in the world. Well, not when I was a child! Following the usual Spanish traditions, the Three Reyes Magos brought me presents in their camels 🐫 following the Star 💫 Nowadays, it's Sinterklaas who puts pepernoten in my shoes 👞. Because in fact there's not one, but a group of present-bringing people, distributed across the globe 🌐, each with their own part of the world to cover (or you really thought just one man and eight reindeers could do it all by their own?)

In order to communicate, they need a way to exchange information about presents. Since they are all very wise, they've agreed to use Haskell, so this information is represented using algebraic data types. Here are some types related to building blocks:

data Block = Block BlockBrand String Color
data BlockBrand = Lego | KNex | Kappla | ...
data Color = Blue | Red | Yellow | ...

Given that they have thousands of kinds of toys and huge enumerations, they would prefer not to define serialization manually. "It's almost Christmas, just use JSON and call it a day", I hear you thinking. They cannot, because otherwise the Grinch would try to steal everything! They follow their very own protocol. Could we maybe help them to do this once and for all?

Setting up the stage

In order to write a function, you often first write out its type. In our case we are going to represent the serialization as a Builder for ByteStrings. We need our messages to arrive fast, and this is one of the best ways to generate byte-data in Haskell!

import Data.ByteString.Builder (Builder)
import qualified Data.ByteString.Builder as B

Every such generic operation is defined by means of a type class. Why is this necessary? Well, serialization cannot be defined for every type -- think of how you would serialize functions -- and classes is how you carve out a subset of types.

class Santinize a where
  santinize :: a -> Builder

We can give a few instances for basic types. In order to distract the Grinch, our protocol is going to introduce a lot of (proper) noise in the encoding.

instance Santinize Int where
  santinize x = B.shortByteString "UU" <> B.intDec x <> B.shortByteString "UU"

instance Santinize String where
  santinize x = B.shortByteString "YOYO" <> B.stringUtf8 x

Enter generic programming

(Data type) generic programming is a technique for writing functions which operate on many data types, in which we are aware of the structure. In other words, you can express things such as "do the following over each field in a record". Haskell is one of the languages with the best support for this kind of programming, due to a combination of built-in derivation of the Generic class and the type class system. In this post we are going to use the generics-sop library, which provides one of the simplest interfaces to this functionality:

import Generics.SOP

The key idea is to realize that regular data types in Haskell (that is, those which you can write without additional extensions such as GADTs) always follow the same pattern:

A choice of constructor,
With a sequence of fields,
Each containing a single value.

In case of a record we only have one constructor, yet still still choose it in step (1). In case of proper enumerations -- like the type BlockBrand above -- the sequence of fields in (2) is empty. generics-sop provides a way to write functions by representing those steps as their own data types:

First a newtype SOP which just marks that we are moving into the world of data type generic programming,
Choice is represented by NS; the S comes from "sum", as we often call things like enumerations "sum types",
The sequence of fields is represented by NP; here the P comes from "product type", another name for records,
Finally, each single value is wrapped in a newtype called I.

Armed with this new knowledge, let's write the generic function -- or in fact three of them. gsantinize is the top-level one which simply removes the newtype layer.

gsantinize :: All2 Santinize r => SOP I r -> Builder
gsantinize (SOP x) = gsantinizeS x

gsantinizeS figures out which constructor has been taken. This is done in a quite clever way: imagine you lay out all the constructor in the same order they've been defined. You start at the first one: if you choose it, you indicate so using the Z constructor. If you prefer to move one constructor further, you use S. For example, KNex from BlockBrand above would be represented as S (Z ...), since it's the second constructor: move one, then choose.

gsantinizeS
  :: All2 Santinize choices
  => NS (NP I) choices -> Builder
gsantinizeS (Z x) = B.shortByteString "NO" <> gsantinizeP x
gsantinizeS (S x) = B.shortByteString "NI" <> gsantinizeS x

Notice that whereas the S case in gsantinizeS calls itself recursively, the Z case moves to gsantinizeP, which takes care of steps (2) and (3): iterating over the sequence of fields and doing something with each value. Due to the typing mechanisms in the library, the regular list type cannot be used, instead we have Nil for the empty sequence and (:*) for adjoining an element at the front -- what we usually call [] and (:).

gsantinizeP
  :: All Santinize fields
  => NP I fields -> Builder
gsantinizeP Nil         = mempty
gsantinizeP (I x :* xs) = santinize x <> gsantinizeP xs

There's something on those definitions which you are surely wondering about: what is All2 and All? This is a way to constrain when we can apply these generic operations. In particular, at the very last step in gsantinizeP, we called santinize to write down the value of that field. That means the we can only use gsantinize if every field in the data type implements the Santinize class. This is encoded by All at the product level, and All2 at the sum level -- the 2 indicates "two levels deep", since we need that all constructors are such that all fields implement Santinize.

Using the generic function

Simply writing gsantinize does not make it available for usage in every data type which may support it. Instead, using a generic function is opt-in in GHC. To use it, you have two follow a couple of steps.

First you need to ask GHC and generics-sop to generate the code which turns your data type into the representation used by the latter. There's a lot of magic ⚗️ involved here (we first ask the compiler to derive another representation called GHC.Generic, which is then turned into the generics-sop one), but the nice thing is we don't have to care 🤷, just include a few more elements in the deriving clause.

{-# language DeriveGeneric, DeriveAnyClass #-}

import qualified GHC.Generics as GHC

data Block = Block BlockBrand String Color
  deriving (GHC.Generics, Generic)

Now every instance is as simple as converting to the representation used by generics-sop -- this is done via the aptly-called from -- and then use gsantinize.

instance Santinize Block where
  santinize = gsantinize . from

You can even go one step further and tell GHC to use this implementation as default whenever the data type supports the corresponding constraints. To do so you need to rework the original Santinize type class.

{-# language DefaultSignatures #-}

class Santinize a where
  santinize :: a -> Builder

  default santinize
    :: (Generic a, All2 Santinize (Code a))
    => a -> Builder
  santinize = gsantinize . from

The last four lines state: here's a default definition of santinize which can be used whenever:

We can convert it to the representation used by generic-sop. This is indicated by the Generic constraint. The same one we derived automatically one minute ago!
And then the constraint All2 Santinize we carry on from gsantinize.

With this additional set-up we can even make Santinize part of the deriving clause!

data Block = Block BlockBrand String Color
  deriving (GHC.Generics, Generic, Santinize)

Hooray! 🙌

A word from our sponsors

If you liked this post, and you are interested not only in fancy type level programming in Haskell, but also more pragmatic material, check my book Practical Haskell. I also happen to have written a book with more than you ever wanted to know about monads. Ah! and at 47 Degrees Academy we are happy to offer you trainings about Haskell and many other functional languages.

serras/advent-haskell-2020.md

Talking about Toys

Setting up the stage

Enter generic programming

Using the generic function

A word from our sponsors