GADTs: A primer

Sketch link

This post assumes you are familiar with Algebraic Data Types (sum types/variants/tagged unions and product types: records, tuples, etc) and comfortable reading type signatures. For a brush up on ADTs and specifically on sum types/variants, refer Reason's documentation and rauschmayer's blog on variants in Reason. Also while you are at it, a quick look on phantom types wouldn't hurt 🙂.

Generalized Algebraic Data Type

Generalized Algebraic Data Type—which I will refer to by the more radio-friendly name GADT hereafter—is a powerful and quite a unique¹ feature. If you are coming from an untyped or a weakly typed language and switched to Reason for the type safety it provides, then GADTs are just one more step further in providing more compile time guarantees. In a nutshell GADTs allow us more control on type safety than simple ADTs offer.

Some common usecases for GADTs:

Generic programming GADTs allows us to do a bit of runtime type checking but in a type safe manner—further discussed in the post how GADTs can be used for polymorphic returns. although not covered in this post but jane street's blogpost explains how GADTs can be used for making generalized data structures for performance gains.
Verified Data structures by having more control on types, we can make data structures e.g. a self balancing trees that would refuse to compile if unbalanced.
Type-safe DSLs GATDs are quite extensively used for building domain specific languages for e.g. dsl for type safe database access or type safe printf.

ADTs as GADTs

Here is a simple ADT representing booleans. This is isomorphic to the existing bool type in Reason.

type bool' = True | False

GADTs enable us to explicitly specify types for data constructors² when defining them. This is the key point of GADTs.

Thus the above ADT can be written in a GADT style by stating both True and False have a type bool'.

type bool' =
  | True: bool'
  | False: bool';

Now we have promoted our simple ADT for booleans to a GADT representation, but what did we gain out of it ?

Well, nothing. Literally, we didn't gain anything more that what the ADT could already do, but let's keep continuing and make ourselves a little more familiar with GADTs.

Like we did for boolean we can do the same for option type and see that it still behave the same. We can also go ahead and implement a map function same as the Belt.Option.map for our promoted option' type.

type option'('a) =
  | None': option'('a)
  | Some'('a): option'('a);

let mapOption' = (f, opt) =>
  switch (opt) {
  | None' => None'
  | Some'(x) => Some'(f(x))
  };

let a = Some'(5);
let b = None';
let inc = x => x + 1;
let c = a |> mapOption'(inc);
let d = b |> mapOption'(inc);

Unleashing GADTs

I'll say it again as this is crucial for understanding GADTs.

GADTs allow us to explicitly specify types for data constructors.

We'll be using this feature to encode extra information in these types.

Polymorphic returns

Here is a simple ADT defining a variant of primitive values.

type prim =
  | Int(int)
  | Float(float)
  | Bool(bool)
  | Str(string);

Our goal is to write a function that when given a value of type prim it returns the primitive value it holds.

/*
eval(Int(42))
=> 42

eval(Float(4.2))
=> 4.2

eval(Bool(false))
=> false

eval(Str("Hello"))
=> "Hello"
*/

If we attempt to write an evaluator function to return primitive values, we will fail as our return types won't match. This is where GADTs come into play.

Here is the GADT style implementation for our prim type.

type prim('a) =
  | Int(int): prim(int)
  | Float(float): prim(float)
  | Bool(bool): prim(bool)
  | Str(string): prim(string);

A few things to unpack here.

We made prim type polymorphic by specifying 'a
When specify types for data constructors instead of giving prim('a) to each type we gave them a more concrete type otherwise it would be same as a simple ADT.
incase of Int(int) we specified its type not as prim('a) but rather prim(int), and similarly for rest of the data constructors. Thus we encoded extra information in our type.
The usage 'a is similar to how we specify for phantom types, in the sense that we specified concrete types for all occurrences of 'a. GADTs are also sometimes referred as first class phantom type.

Now we can write an eval function over prim and return the primitive values.

let eval: type a. prim(a) => a =
  fun
  | Int(i) => i
  | Float(f) => f
  | Bool(b) => b
  | Str(s) => s;

Notice the type a. above—we are making a a scoped type variable. In the above code we have type a., read this as

for all type a

Thus the whole type signature now reads

for all type a, given a value of type primitive of a it returns a value of type a.

having type a . helps us bring type a in scope and thus we can use it for return type.

We can now use eval to return different primitive types depending on input, thus have polymorphic returns.

let myInt = eval(Int(42));
let myFloat = eval(Float(4.2));
let myBool = eval(Bool(false));
let myStr = eval(Str("Hello"));

This eval function is not possible to implement on simple ADTs like the one declared above.

Heterogeneous List

A list type in Reason can contain values of only a specific type. We can have either a list of string or list of integers but not a list that contains both string and integers.

A list is defined as either an empty list or a list combined with another element (represented as cons).

module List' = {
  type t('a) =
    | []
    | ::(('a, t('a)));
}

let myList = List'.[1, 2, 3, 4];

A few things to note

Here the empty list is represented by [ ] andcons operation by ::
We overload this syntax for empty and cons so that we can get sugar syntax for representing lists via square brackets—this overloading is only supported in OCaml >= 4.0.3

We can see our list has a constraint that all elements should be of same type 'a, but we can use GADTs to implement a heterogeneous list, where elements in the list can be of different type.

module HList = {
  type t =
    | []: t
    | ::(('a, t)): t;

  let rec length: t => int =
    fun
    | [] => 0
    | [h, ...t] => 1 + length(t);
};

Let's unpack the code above

The type of [] is pretty straight forward, but in the type of :: we hide the extra parameter 'a, by specifying the type of :: as simply t.
Here type 'a acts as an existential type.

let myHeteroList = HList.[1, 2.5, false, "abc", Some(5), [1, "def"]];
let myListLenght = HList.length(myHeteroList);

Note Do not use this representation of heterogeneous list, it is only meant for a quick explanation. Existential types—unfortunately nothing to do with works of sartre—are a way of "squashing" a group of types into one, single type. Notice that the type signature just reads poly since we lost all the type information. This makes them quite limited, one won't even be able to write a head function for this implementation.

For a better implementation use difflists. It instead of hiding types keeps track of types of values in the list—explained pretty well in drup's blog

A step closer to dependently typed programs

We know that values depends on thier types. Dependent type is a type whose definition depends on a value. Reason doesn't have dependent types but GADTs can help tighten the constraints between types and their corresponding values. We can use GADTs to pass some extra information at type level for different values of type and in a way simulate dependently typed behaviour to some extent.

Safe List

The head operation on list in Reason (Belt.List.head) returns an option type. If the list has at-least one value, we get Some(value) otherwise we get None.

Our aim is implement a list where head operation refuses to compile on empty list otherwise return the first element of the list without wrapping it in option type.

To implement such behaviour we need to encode some extra information to differentiate between empty and non empty list but still represent them as same list type.

We first represent empty and non empty types separately and further use them to differentiate between empty and non empty list.

module SafeList = {
  type empty = Empty;
  type nonEmpty = NonEmpty;

  type t('a, 's) =
    | []: t('a, empty)
    | ::(('a, t('a, 's))): t('a, nonEmpty);

  let rec length: type s. t(_, s) => int =
    fun
    | [] => 0
    | [h, ...t] => 1 + length(t);

  let head: type a. t(a, nonEmpty) => a =
    fun
    | [x, ..._] => x;
};

Few things to note here

In comparison to the normal List type, we have an extra type parameter passed to the list.
We use the extra parameter to specify if the list is empty or non empty.
Our head function is specialized for only nonEmpty type and thus will not work for empty list.
Contrary to head, the length function is for all types s, thus will work on both empty and non empty lists.

let nonEmptyList = SafeList.[1, 2, 3, 4];
let sizeOfNonEmptyList = SafeList.length(nonEmptyList);
let firstElem = SafeList.head(nonEmptyList);

let emptyList = SafeList.[];
let sizeOfEmptyList = SafeList.length(emptyList);

On an empty list, the head function refuses to compile.

/* let _ = SafeList.head(emptyList); */

uncomment the last line to see the compiler error.

Exercise

Implement a tail function for the SafeList above.

Peano Numbers

We can also take SafeLists a bit further and implement list that is not only aware f it's empty or not but rather aware of the number of elements it contains in its type. To implement a such a list, we have to first implement peano numbers

Peano numbers are a simple way of representing natural numbers³ using only a zero value and a successor function. We can define a type zero and every other natural number in form of its successors.

type zero = Zero;
type succ('a) = Succ('a);
type nat(_) =
  | Zero: nat(zero)
  | Succ(nat('a)): nat(succ('a));

we can now represent natural numbers at type level

type one_ = nat(succ(zero));
type two_ = nat(succ(succ(zero)));
type three_ = nat(succ(succ(succ(zero))));

and further we can reflect these types at value level.

let one: one_ = Succ(Zero);
let two: two_ = Succ(Succ(Zero));
let three: three_ = Succ(two);

to write an icr function for adding one is same as wrapping with Succ

let rec inc: nat('a) => nat(succ('a)) =
  fun
  | n => Succ(n);
let three_ = inc(Succ(Succ(Zero)));

a dec function for subtracting one will only work for non zero nats. dec on zero won't compile as we are dealing only with natural numbers.

let rec dec: nat(succ('a)) => nat('a) =
  fun
  | Succ(a) => a;
let one_ = dec(Succ(Succ(Zero)));

/* let _ = dec(dec(one_)); */

uncomment the last line to see the compiler error.

implementing equality is just recursive pattern matching.

let rec isEqual: type a b. (nat(a), nat(b)) => bool =
  (i, j) =>
    switch (i, j) {
    | (Zero, Zero) => true
    | (Succ(n), Succ(m)) => isEqual(n, m)
    | (_, _) => false
    };

let (=~=) = isEqual;
let isTwoEqualToOne = two =~= one;
let isThreeEqualToSuccTwo = three =~= Succ(two);

finally we can write an eval function to convert to integers.

let rec eval: type a. nat(a) => int =
  fun
  | Zero => 0
  | Succ(m) => 1 + eval(m);
let threeValue = eval(three);
let fourValue = eval(Succ(three));

Length List

Our goal is now to implement list that are aware of their size. We already know how to represent natural numbers at type level using peano numbers and can leverage them to carry length information in types.

module LengthList = {
  type t('a, _) =
    | []: t('a, nat(zero))
    | ::('a, t('a, nat('l))): t('a, nat(succ('l)));

  let rec length: type a. t(_, a) => int =
    fun
    | [] => 0
    | [h, ...t] => 1 + length(t);

  let pop: type a. t(_, nat(succ(a))) => t(_, nat(a)) =
    fun
    | [_, ...xs] => xs

  let push: type a. (t(_, nat(a)), _) => t(_, nat(succ(a))) =
    (l, v) =>
      switch (l) {
      | [] => [v]
      | xs => [v, ...xs]
      };
};

Lots to unpack here

In a similar manner as the SafeList, here we provide an extra type parameter to our list type and use it store the size of the list.
For an empty list the size is zero, and for the cons, the size is succ of the existing list.
In the pop function, we specify in the type signature that input will have at-least one element by saying input type to be nat(succ(a)) and the return will have a size smaller thus having type nat(a)
Same goes for push, we don't have any constraints on length for input type by specifying nat(a) but in the return type length has a type nat(succ(a))
The implementation of push provides type safe guarantee that the list of length will match the one encoded in type and same goes for pop, pop won't compile on empty lists.

let twoElemList = LengthList.[1, 2];
let threeElemList = LengthList.push(twoElemList, 3);
let oneElemList = LengthList.(pop(pop(threeElemList)));

/* let _ = LengthList.(pop(pop(oneElemList))); */

uncomment the last line to see the compiler error.

Note, because of the constraints we put on length type parament for both push and pop, the resultant code is the only possible implementation.

for regular list type, the implementation of push_ below passes the type checker even when it is clearly wrong. By providing length in types of LengthList, we can guarantee to have a safer implementation.

let push_: (list('a), 'a) => list('a) =
  (l, v) =>
    switch (l) {
    | [] => [v]
    | xs => [v]
    };
let _ = push_([1, 2, 3], 4);

Tightening constrains with type has its obvious advantages but it often comes with some compromises for eg. it is not possible to write a filter function for our LengthList.

Exercise

Implement a length aware heterogeneous list.

DSLs with GADTs

To keep this post at a comfortable reading length, writing DSLs with GADTs will be covered in the next post.

[1] As far as i know, apart from dependently typed languages only haskell has GADTs. [2] More specifically Generalized Algebraic types are generalized because they allow constructor terms with differently kinded types. [3] In mathematics some definitions consider only positive numbers as natural numbers and some include 0 as well, but in computer science we prefer the latter and consider 0 as a natural number too.

shrynx/GADTs.md

GADTs: A primer

Generalized Algebraic Data Type

ADTs as GADTs

Unleashing GADTs

Polymorphic returns

Heterogeneous List

A step closer to dependently typed programs

Safe List

Exercise

Peano Numbers

Length List

Exercise

DSLs with GADTs