This post assumes you are familiar with Algebraic Data Types (sum types/variants/tagged unions and product types: records, tuples, etc) and comfortable reading type signatures. For a brush up on ADTs and specifically on sum types/variants, refer Reason's documentation and rauschmayer's blog on variants in Reason. Also while you are at it, a quick look on phantom types wouldn't hurt 🙂.
Generalized Algebraic Data Type—which I will refer to by the more radio-friendly name GADT hereafter—is a powerful and quite a unique¹ feature. If you are coming from an untyped or a weakly typed language and switched to Reason for the type safety it provides, then GADTs are just one more step further in providing more compile time guarantees. In a nutshell GADTs allow us more control on type safety than simple ADTs offer.
Some common usecases for GADTs:
-
Generic programming GADTs allows us to do a bit of runtime type checking but in a type safe manner—further discussed in the post how GADTs can be used for polymorphic returns. although not covered in this post but jane street's blogpost explains how GADTs can be used for making generalized data structures for performance gains.
-
Verified Data structures by having more control on types, we can make data structures e.g. a self balancing trees that would refuse to compile if unbalanced.
-
Type-safe DSLs GATDs are quite extensively used for building domain specific languages for e.g. dsl for type safe database access or type safe
printf
.
Here is a simple ADT representing booleans. This is isomorphic to the existing bool
type in Reason.
type bool' = True | False
GADTs enable us to explicitly specify types for data constructors² when defining them. This is the key point of GADTs.
Thus the above ADT can be written in a GADT style by stating both True
and False
have a type bool'
.
type bool' =
| True: bool'
| False: bool';
Now we have promoted our simple ADT for booleans to a GADT representation, but what did we gain out of it ?
Well, nothing. Literally, we didn't gain anything more that what the ADT could already do, but let's keep continuing and make ourselves a little more familiar with GADTs.
Like we did for boolean
we can do the same for option
type and see that it still behave the same. We can also go ahead and implement a map
function same as the Belt.Option.map
for our promoted option'
type.
type option'('a) =
| None': option'('a)
| Some'('a): option'('a);
let mapOption' = (f, opt) =>
switch (opt) {
| None' => None'
| Some'(x) => Some'(f(x))
};
let a = Some'(5);
let b = None';
let inc = x => x + 1;
let c = a |> mapOption'(inc);
let d = b |> mapOption'(inc);
I'll say it again as this is crucial for understanding GADTs.
GADTs allow us to explicitly specify types for data constructors.
We'll be using this feature to encode extra information in these types.
Here is a simple ADT defining a variant of primitive values.
type prim =
| Int(int)
| Float(float)
| Bool(bool)
| Str(string);
Our goal is to write a function that when given a value of type prim
it returns the primitive value it holds.
/*
eval(Int(42))
=> 42
eval(Float(4.2))
=> 4.2
eval(Bool(false))
=> false
eval(Str("Hello"))
=> "Hello"
*/
If we attempt to write an evaluator function to return primitive values, we will fail as our return types won't match. This is where GADTs come into play.
Here is the GADT style implementation for our prim
type.
type prim('a) =
| Int(int): prim(int)
| Float(float): prim(float)
| Bool(bool): prim(bool)
| Str(string): prim(string);
A few things to unpack here.
-
We made
prim
type polymorphic by specifying'a
-
When specify types for data constructors instead of giving
prim('a)
to each type we gave them a more concrete type otherwise it would be same as a simple ADT. -
incase of
Int(int)
we specified its type not asprim('a)
but ratherprim(int)
, and similarly for rest of the data constructors. Thus we encoded extra information in our type. -
The usage
'a
is similar to how we specify for phantom types, in the sense that we specified concrete types for all occurrences of'a
. GADTs are also sometimes referred as first class phantom type.
Now we can write an eval
function over prim
and return the primitive values.
let eval: type a. prim(a) => a =
fun
| Int(i) => i
| Float(f) => f
| Bool(b) => b
| Str(s) => s;
Notice the type a.
above—we are making a
a scoped type variable.
In the above code we have type a.
, read this as
for all type a
Thus the whole type signature now reads
for all type a, given a value of type primitive of a it returns a value of type a.
having type a .
helps us bring type a
in scope and thus we can use it for return type.
We can now use eval to return different primitive types depending on input, thus have polymorphic returns.
let myInt = eval(Int(42));
let myFloat = eval(Float(4.2));
let myBool = eval(Bool(false));
let myStr = eval(Str("Hello"));
This eval
function is not possible to implement on simple ADTs like the one declared above.
A list
type in Reason can contain values of only a specific type. We can have either a list of string or list of integers but not a list that contains both string and integers.
A list is defined as either an empty list or a list combined with another element (represented as cons
).
module List' = {
type t('a) =
| []
| ::(('a, t('a)));
}
let myList = List'.[1, 2, 3, 4];
A few things to note
- Here the empty list is represented by
[ ]
andcons
operation by::
- We overload this syntax for empty and cons so that we can get sugar syntax for representing lists via square brackets—this overloading is only supported in OCaml >= 4.0.3
We can see our list has a constraint that all elements should be of same type 'a
, but we can use GADTs to implement a heterogeneous list, where elements in the list can be of different type.
module HList = {
type t =
| []: t
| ::(('a, t)): t;
let rec length: t => int =
fun
| [] => 0
| [h, ...t] => 1 + length(t);
};
Let's unpack the code above
- The type of
[]
is pretty straight forward, but in the type of::
we hide the extra parameter'a
, by specifying the type of::
as simplyt
. - Here type
'a
acts as an existential type.
let myHeteroList = HList.[1, 2.5, false, "abc", Some(5), [1, "def"]];
let myListLenght = HList.length(myHeteroList);
Note Do not use this representation of heterogeneous list, it is only meant for a quick explanation.
Existential types—unfortunately nothing to do with works of sartre—are a way of "squashing" a group of types into one, single type. Notice that the type signature just reads poly
since we lost all the type information. This makes them quite limited, one won't even be able to write a head
function for this implementation.
For a better implementation use difflists
. It instead of hiding types keeps track of types of values in the list—explained pretty well in drup's blog
We know that values depends on thier types. Dependent type is a type whose definition depends on a value. Reason doesn't have dependent types but GADTs can help tighten the constraints between types and their corresponding values. We can use GADTs to pass some extra information at type level for different values of type and in a way simulate dependently typed behaviour to some extent.
The head
operation on list in Reason (Belt.List.head) returns an option
type. If the list has at-least one value, we get Some(value)
otherwise we get None
.
Our aim is implement a list where head
operation refuses to compile on empty list otherwise return the first element of the list without wrapping it in option type.
To implement such behaviour we need to encode some extra information to differentiate between empty
and non empty
list but still represent them as same list type.
We first represent empty and non empty types separately and further use them to differentiate between empty
and non empty
list.
module SafeList = {
type empty = Empty;
type nonEmpty = NonEmpty;
type t('a, 's) =
| []: t('a, empty)
| ::(('a, t('a, 's))): t('a, nonEmpty);
let rec length: type s. t(_, s) => int =
fun
| [] => 0
| [h, ...t] => 1 + length(t);
let head: type a. t(a, nonEmpty) => a =
fun
| [x, ..._] => x;
};
Few things to note here
- In comparison to the normal List type, we have an extra type parameter passed to the list.
- We use the extra parameter to specify if the list is empty or non empty.
- Our
head
function is specialized for onlynonEmpty
type and thus will not work forempty
list. - Contrary to
head
, the length function is for all typess
, thus will work on bothempty
andnon empty
lists.
let nonEmptyList = SafeList.[1, 2, 3, 4];
let sizeOfNonEmptyList = SafeList.length(nonEmptyList);
let firstElem = SafeList.head(nonEmptyList);
let emptyList = SafeList.[];
let sizeOfEmptyList = SafeList.length(emptyList);
On an empty list, the head
function refuses to compile.
/* let _ = SafeList.head(emptyList); */
uncomment the last line to see the compiler error.
Implement a tail
function for the SafeList
above.
We can also take SafeLists
a bit further and implement list that is not only aware f it's empty or not but rather aware of the number of elements it contains in its type.
To implement a such a list, we have to first implement peano numbers
Peano numbers are a simple way of representing natural numbers³ using only a zero value and a successor function. We can define a type zero and every other natural number in form of its successors.
type zero = Zero;
type succ('a) = Succ('a);
type nat(_) =
| Zero: nat(zero)
| Succ(nat('a)): nat(succ('a));
we can now represent natural numbers at type level
type one_ = nat(succ(zero));
type two_ = nat(succ(succ(zero)));
type three_ = nat(succ(succ(succ(zero))));
and further we can reflect these types at value level.
let one: one_ = Succ(Zero);
let two: two_ = Succ(Succ(Zero));
let three: three_ = Succ(two);
to write an icr
function for adding one is same as wrapping with Succ
let rec inc: nat('a) => nat(succ('a)) =
fun
| n => Succ(n);
let three_ = inc(Succ(Succ(Zero)));
a dec
function for subtracting one will only work for non zero nats.
dec
on zero won't compile as we are dealing only with natural numbers.
let rec dec: nat(succ('a)) => nat('a) =
fun
| Succ(a) => a;
let one_ = dec(Succ(Succ(Zero)));
/* let _ = dec(dec(one_)); */
uncomment the last line to see the compiler error.
implementing equality is just recursive pattern matching.
let rec isEqual: type a b. (nat(a), nat(b)) => bool =
(i, j) =>
switch (i, j) {
| (Zero, Zero) => true
| (Succ(n), Succ(m)) => isEqual(n, m)
| (_, _) => false
};
let (=~=) = isEqual;
let isTwoEqualToOne = two =~= one;
let isThreeEqualToSuccTwo = three =~= Succ(two);
finally we can write an eval
function to convert to integers.
let rec eval: type a. nat(a) => int =
fun
| Zero => 0
| Succ(m) => 1 + eval(m);
let threeValue = eval(three);
let fourValue = eval(Succ(three));
Our goal is now to implement list that are aware of their size. We already know how to represent natural numbers at type level using peano numbers and can leverage them to carry length information in types.
module LengthList = {
type t('a, _) =
| []: t('a, nat(zero))
| ::('a, t('a, nat('l))): t('a, nat(succ('l)));
let rec length: type a. t(_, a) => int =
fun
| [] => 0
| [h, ...t] => 1 + length(t);
let pop: type a. t(_, nat(succ(a))) => t(_, nat(a)) =
fun
| [_, ...xs] => xs
let push: type a. (t(_, nat(a)), _) => t(_, nat(succ(a))) =
(l, v) =>
switch (l) {
| [] => [v]
| xs => [v, ...xs]
};
};
Lots to unpack here
- In a similar manner as the SafeList, here we provide an extra type parameter to our list type and use it store the size of the list.
- For an
empty
list the size is zero, and for thecons
, the size issucc
of the existing list. - In the
pop
function, we specify in the type signature that input will have at-least one element by saying input type to benat(succ(a))
and the return will have a size smaller thus having typenat(a)
- Same goes for
push
, we don't have any constraints on length for input type by specifyingnat(a)
but in the return type length has a typenat(succ(a))
- The implementation of
push
provides type safe guarantee that the list of length will match the one encoded in type and same goes forpop
,pop
won't compile on empty lists.
let twoElemList = LengthList.[1, 2];
let threeElemList = LengthList.push(twoElemList, 3);
let oneElemList = LengthList.(pop(pop(threeElemList)));
/* let _ = LengthList.(pop(pop(oneElemList))); */
uncomment the last line to see the compiler error.
Note, because of the constraints we put on length type parament for both push
and pop
, the resultant code is the only possible implementation.
for regular list type, the implementation of push_
below passes the type checker even when it is clearly wrong. By providing length in types of LengthList
, we can guarantee to have a safer implementation.
let push_: (list('a), 'a) => list('a) =
(l, v) =>
switch (l) {
| [] => [v]
| xs => [v]
};
let _ = push_([1, 2, 3], 4);
Tightening constrains with type has its obvious advantages but it often comes with some compromises for eg. it is not possible to write a filter
function for our LengthList
.
Implement a length aware heterogeneous list.
To keep this post at a comfortable reading length, writing DSLs with GADTs will be covered in the next post.
[1] As far as i know, apart from dependently typed languages only haskell has GADTs. [2] More specifically Generalized Algebraic types are generalized because they allow constructor terms with differently kinded types. [3] In mathematics some definitions consider only positive numbers as natural numbers and some include 0 as well, but in computer science we prefer the latter and consider 0 as a natural number too.