Skip to content

Instantly share code, notes, and snippets.

@polytypic
Last active September 26, 2024 09:46
Show Gist options
  • Save polytypic/fc8d71ae29fac9c770b3b91d81bf5f4f to your computer and use it in GitHub Desktop.
Save polytypic/fc8d71ae29fac9c770b3b91d81bf5f4f to your computer and use it in GitHub Desktop.
A pattern for using GADTs to subset cases

A pattern for using GADTs to subset cases

Consider the following straightforward mutable queue implementation using two stacks:

type 'a adt_queue = {
  mutable head : 'a head;
  mutable tail : 'a tail;
}

and 'a head = Cons of 'a * 'a head | Head
and 'a tail = Snoc of 'a tail * 'a | Tail

let create () = { head = Head; tail = Tail }

let push t x = t.tail <- Snoc (t.tail, x)

let rec rev_to head = function
  | Tail -> head
  | Snoc (xs, x) -> rev_to (Cons (x, head)) xs

let rev tail = rev_to Head tail

let pop_opt t =
  match t.head with
  | Cons (x, xs) ->
    t.head <- xs;
    Some x
  | Head ->
    match t.tail with
    | Tail -> None
    | Snoc _ as tail ->
      match rev tail with
      | Head -> assert false (* <-- This is annoying! *)
      | Cons (x, xs) ->
        t.head <- xs;
        Some x

You might have noticed the use of the separate _ head and _ tail list type constructors for the head and the tail, respectively. For the above having such types is not strictly necessary, but it is a stepping stone for the changes we'll introduce soon.

As pointed out by the comment, one annoying thing there is that at the point we match on the result of rev tail we know that it will not be an empty list, or the constructor Head in this case, but the OCaml compiler doesn't know that.

Could we use GADTs to avoid that unnecessary case? Well, yes, we can. The problem with GADTs, however, is that they tend to be tricky to use. If you've used them in anger, so to speak, you probably know what I'm talking about. OTOH, if you find them entirely intuitive to use, you might want to consider writing a tutorial or two on them! In the following I'll describe a pattern for using GADTs with polymorphic variants and existentials that allows cases like the above to be handled fairly mechanically. It should be noted, however, that this is not the only way to achieve the effect we want, which is to eliminate the unnecessary case from the above. This is just a pattern that makes it straightforward and mechanical.

Consider the following GADT:

type _ t =
  | Foo : [> `Foo ] t
  | Bar : [> `Bar ] t
  | Baz : [> `Baz ] t

Each of the cases mechanically includes the label as a polymorphic variant that also allows more tags (i.e. they have the >). This allows us to specify functions that only accept a subset of the cases as long as we explicitly type them and mechanically specify them to only allow the desired tags or less (i.e. they have the <):

let foo_or_bar (t : [< `Foo | `Bar ] t) =
  match t with
  | Foo -> "It was a Foo"
  | Bar -> "It was a Bar"

let _ = foo_or_bar Foo
let _ = foo_or_bar Bar

let just_baz (t : [< `Baz ] t) =
  match t with
  | Baz -> "It is a Baz"

let _ = just_baz Baz

Specifying the bounds like this can be a bit cumbersome and it also requires polymorphism. We can also specify types that only allow a specific subset by using existentials. For example, we could define an existential for the subset that includes Foo and Bar:

type foo_or_bar = Foo_or_Bar : [< `Foo | `Bar] t -> foo_or_bar [@@unboxed]

Note we marked the existential type as [@@unboxed]. This is important for performance (and sometimes other) reasons. It basically means that the Foo_or_Bar constructor is the identity function and using it allocates no memory.

Just like when specifying the bounds explicitly, having a foo_or_bar value we can pattern match on the subset of cases it allows:

let foo_or_bar (Foo_or_Bar t) =
  match t with
  | Foo -> "It was a Foo"
  | Bar -> "It was a Bar"

Nice!

Let's then put the above to use and reimplement the queue in a mechanical way avoiding the unnecessary pattern match for the empty list.

First we define a new set of types for the queue. Just like we defined the foo_or_bar existential for the _ t GADT, for the new GADTs, (_, _) head and (_, _) tail, we also define _ head_pack and _ tail_pack existentials:

type 'a gadt_queue = {
  mutable head : 'a head_pack;
  mutable tail : 'a tail_pack;
}

and ('a, _) head =
  | Cons : 'a * 'a head_pack -> ('a, [> `Cons ]) head
  | Head : ('a, [> `Head ]) head

and 'a head_pack =
  | H : ('a, [< `Cons | `Head ]) head -> 'a head_pack [@@unboxed]

and ('a, _) tail =
  | Snoc : 'a tail_pack * 'a -> ('a, [> `Snoc ]) tail
  | Tail : ('a, [> `Tail ]) tail

and 'a tail_pack =
  | T : ('a, [< `Snoc | `Tail ]) tail -> 'a tail_pack [@@unboxed]

Complicated as the above might seem at first glance, one can arrive at the above mechanically based on the previous discussion.

The create function is entirely straightforward:

let create () = { head = H Head; tail = T Tail }

The only addition is that we used the existential constructors H and T for the head and tail, respectively.

The same goes for the push operation, we simply tag the new tail with the existential constructor T:

let push t x = t.tail <- T (Snoc (t.tail, x))

For the rev and rev_to functions we use both explicitly bounded types and also the existential tags where needed:

let rec rev_to (head : (_, [< `Cons ]) head) = function
  | T Tail -> head
  | T (Snoc (xs, x)) -> rev_to (Cons (x, H head)) xs

let rev = function (Snoc (tail, x) : (_, [< `Snoc ]) tail) ->
  rev_to (Cons (x, H Head)) tail

Finally we can write pop_opt:

let pop_opt t =
  match t.head with
  | H (Cons (x, xs)) ->
    t.head <- xs;
    Some x
  | H Head ->
    match t.tail with
    | T Tail -> None
    | T (Snoc _ as tail) ->
      t.tail <- T Tail;
      match rev tail with
      | Cons (x, xs) ->
        t.head <- xs;
        Some x

All of the above compile with no errors nor warnings and we avoid the unnecessary pattern match case on rev tail!

Like said earlier, the nice thing about this approach is that this is fairly mechanical. You just tag the GADT constructors with their names and you explicitly specify the subset of constructors when needed. You can then use existentials to specify subsets as convenience types and to encapsulate the polymorphism.

With that said, it is possible to achieve the same with precision use of GADTs, but it gets tricky.

(mdx
(files article.md))
(lang dune 3.8)
(using mdx 0.4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment