Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save shinzui/b6695642033b48e9167df8e664b3935c to your computer and use it in GitHub Desktop.
Save shinzui/b6695642033b48e9167df8e664b3935c to your computer and use it in GitHub Desktop.
GADTs That Can Be Newtypes and How to Roll 'Em, 2nd Revision

GADTs That Can Be Newtypes and How to Roll 'Em

I think many people know about Data.Some.Newtype⁠—⁠it uses quite a cute (and cursed) trick to encode a simple existential wrapper as a newtype, hence avoiding an unwanted indirection in the runtime representation.

But there's more to data than existentials⁠—⁠so how far do these tricks go? The answer is: surprisingly far! Let's see what other GADTs we can slim down.

Preliminaries

Before jumping into the deep end, there are a few things to consider and be aware of.

Regular ADTs

Setting GADTs aside for now, which ADTs can be made newtypes? The basic restrictions are, of course: one constructor1; one field1.

The less immediate issue is that of laziness⁠—⁠a newtype has no runtime barrier against the evaluation of its contents, so it can't truly implement lazy data.

Then, since newtypes aren't able to be lazy, are they automatically strict? Actually, no. Since newtype constructors don't exist at runtime, pattern matching on one is no different from pattern matching on a bare variable; it doesn't force anything.

To properly emulate strict data, we must replace the newtype constructor with a strict pattern synonym:

{-# LANGUAGE PatternSynonyms #-}

module Strict (Strict(Strict)) where

newtype Strict a = NonStrict a

{-# COMPLETE Strict #-}
pattern Strict :: a -> Strict a
pattern Strict x <- NonStrict !x
  where Strict x =  NonStrict  x

Soundness

The specialty of GADTs is to carry around not just regular data, but also type equality and class constraints. They're designed such that these constraints are only exposed by pattern matching on legitimately constructed data⁠—⁠this is called soundness. When you write a real GADT you don't need to worry about it, but working magic requires great care.

Suppose, for example, we wanted to implement the following as a newtype:

data Diag f a b where
  Diag :: !(f a a) -> Diag f a a

A Diag f a b witnesses that a ~ b:

diagCoerce :: Diag f a b -> a -> b
diagCoerce (Diag _) x = x

While diagCoerce itself is perfectly valid, if you can supply it a bogus premise then it can give back a bogus result. Consider:

unsafeCoerce :: forall a b. a -> b
unsafeCoerce = diagCoerce (undefined :: Diag (,) a b)

The true GADT maintains soundness through an honourable death⁠—⁠it would rather explode than hand out bad constraints. Our newtypes must be the same⁠—⁠ensure they're appropriately strict!

Existentials

Alright, with that out of the way, let's go over the OG.

The key idea is simple enough. The type information you want to discard, just unsafeCoerce it to the safe target GHC.Exts obligingly provides: Any.

{-# LANGUAGE PatternSynonyms #-}

module Some (Some(Some)) where

import GHC.Exts (Any)
import Unsafe.Coerce (unsafeCoerce)

newtype Some f = UnsafeSome (f Any)

{-# COMPLETE Some #-}
pattern Some :: f x -> Some f
pattern Some fx <- UnsafeSome               fx
  where Some fx =  UnsafeSome (unsafeCoerce fx)

It might seem strange that the <- direction type-checks (and I'm not 100% clear on the details myself) but it works because x unifies with Any and isn't bound by the outer universal scope of the pattern type, but its inner existential scope:

pattern Some
  :: forall f. ()
  => forall x. ()
  => f x -> Some f

Some as written above or in Data.Newtype.Some isn't precisely equivalent to either

data Some f where
  Some :: f x -> Some f

or

data Some f where
  Some :: !(f x) -> Some f

since the pattern synonym isn't strict, but this is allowable so long as users are aware⁠—⁠there's no issue of soundness.

Equality Constraints

Equality constraints are the core feature of GADTs⁠—⁠we're lucky that they're well within reach.

The idea is that if we impose an equality constraint upon construction of the data, then we know it's safe to magic up the same when we take it apart.

For the magic itself, it suffices to wield unsafeCoerce against Refl, then ViewPatterns come to the rescue. To demonstrate, let's implement

data Equality a where
  Equality :: L a ~ R a => !(F a) -> Equality a

as a newtype, where L, R and F are stand-ins for whatever pleases you.

{-#
LANGUAGE
  PatternSynonyms, ViewPatterns, RoleAnnotations, UnboxedTuples, TypeFamilies
#-}

module Equality (Equality(Equality)) where

import Data.Type.Equality ((:~:)(..))
import Unsafe.Coerce (unsafeCoerce)

-- stand-ins
type family L a
type family R a
type family F a

-- A type parameter that is involved in an equality must be declared nominal!
-- Otherwise, GHC may infer a weaker role and allow it to be `coerce`d to a
-- type that does not satisfy your equality constraint.
type role Equality nominal
newtype   Equality a = UnsafeEquality{ extract :: F a }

view :: forall a. Equality a -> (# L a :~: R a, F a #)
view !eq = (# unsafeCoerce (Refl @(L a)), extract eq #)
  -- ^-- This bang is necessary for soundness!

{-# COMPLETE Equality #-}
pattern Equality :: () => L a ~ R a => F a -> Equality a
pattern Equality x <- (view -> (# Refl, x #))
  where Equality x = UnsafeEquality x

Empty Classes

The same approach as for equality constraints applies to empty classes, since we can magic them up in much the same way. Again, let's implement

data HasProp a where
  HasProp :: Prop a => !(F a) -> HasProp a

as a newtype where Prop and F are stand-ins for an empty class and some data.

{-#
LANGUAGE
  PatternSynonyms, ViewPatterns, RoleAnnotations, UnboxedTuples, TypeFamilies
#-}

module HasProp (HasProp(HasProp)) where

import Unsafe.Coerce (unsafeCoerce)

-- stand-ins
class Prop a
type family F a

-- A type parameter that is involved in a class constraint must also be
-- declared nominal, for much the same reason.
type role HasProp nominal
newtype   HasProp a = UnsafeHasProp{ extract :: F a }

-- A data type to help us wield `unsafeCoerce` against constraints.
data Dict c = c => Dict

view :: HasProp a -> (# Dict (Prop a), F a #)
view !x = (# unsafeCoerce (Dict @()), extract x #)
--   ^ Again, necessary for soundness!

{-# COMPLETE HasProp #-}
pattern HasProp :: () => Prop a => F a -> HasProp a
pattern HasProp x <- (view -> (# Dict, x #))
  where HasProp x = UnsafeHasProp x

Computable Classes

The instance we want to magic up does need to be trivial in some sense for our "magic" to work, but it does not strictly need to be empty. So long as we can compute the instance dictionary from the data we have at hand, it's possible to take things a step further using withDict.

To demonstrate, we'll implement

data Computable a where
  Computable :: C a => !(G a) -> Computable a

as an opaque newtype with a pattern synonym, where

class C a where
  method :: H a

given

computeForwards  :: (# H a, G a #) -> Computable a
computeBackwards :: Computable a -> (# H a, G a #)

with computeBackwards . computeForwards = id.

{-#
LANGUAGE
  PatternSynonyms, ViewPatterns, RoleAnnotations, UnboxedTuples,
  TypeFamilies, AllowAmbiguousTypes
#-}

module Computable (Computable(Computable)) where

import GHC.Exts (withDict)

-- stand-ins
type family F a
type family G a
type family H a
class C a where
  method :: H a
computeForwards  :: (# H a, G a #) -> Computable a
computeBackwards :: Computable a -> (# H a, G a #)
computeForwards  = undefined
computeBackwards = undefined

-- Declare nominal, yada yada.
type role Computable nominal
newtype   Computable a = UnsafeComputable{ extract :: F a }

computable :: forall a. C a => G a -> Computable a
computable x = computeForwards (# method @a, x #)

data Dict c = c => Dict

view :: forall a. Computable a -> (# Dict (C a), G a #)
view !x = case computeBackwards x of
  -- ^ Bang, yada yada.
  (# gx, fx #) -> (# withDict @(C a) gx Dict, fx #)

{-# COMPLETE Computable #-}
pattern Computable :: () => C a => G a -> Computable a
pattern Computable x <- (view -> (# Dict, x #))
  where Computable x = computable x

This is essentially what SNat, SSymbol and SChar all do, except that they have the added complication of needing to invent type indices with rank-2 types, e.g.

{-# NOINLINE withSomeSNat #-}
withSomeSNat :: Natural -> (forall n. SNat n -> r) -> r
withSomeSNat n k = k (UnsafeSNat n)

If you're doing the same, ensure the function that introduces the indices is NOINLINEd! See Note [NOINLINE withSomeSNat] for details.

Arbitrary Embeddings

In the previous section, we used

computeForwards  :: (# H a, G a #) -> Computable a
computeBackwards :: Computable a -> (# H a, G a #)

in construction and matching respectively. You might already be asking yourself: so long as we embed it reversibly in Computable a, couldn't we use any type we wanted? And indeed, you would not be wrong⁠—⁠we can even support sums!

To demonstrate, we'll implement

data Embedding a where
  Foo :: C x a => G x -> Embedding a
  Bar :: D x a => H x -> Embedding a

as an opaque newtype with pattern synonyms, given

embed :: GADT.Embedding a ->      Embedding a
debed ::      Embedding a -> GADT.Embedding a

with debed . embed = id, where module GADT exports the standard form above.

{-# LANGUAGE PatternSynonyms, ViewPatterns, RoleAnnotations, TypeFamilies #-}

module Embedding (Embedding(Foo,Bar)) where

import qualified GADT

-- stand-ins
import GADT (C, D, G, H)
type family F a
embed :: GADT.Embedding a ->      Embedding a
debed ::      Embedding a -> GADT.Embedding a
embed    = undefined
debed !x = undefined x
   -- ^ debed should be strict

                 -- vvvvvvv nom
type role Embedding nominal
newtype   Embedding a = UnsafeEmbedding{ extract :: F a }

{-# COMPLETE Foo, Bar #-}
pattern Foo :: () => C x a => G x -> Embedding a
pattern Bar :: () => D x a => H x -> Embedding a

pattern Foo gx <- (debed -> GADT.Foo gx)
  where Foo gx =   embed  $ GADT.Foo gx
pattern Bar hx <- (debed -> GADT.Bar hx)
  where Bar hx =   embed  $ GADT.Bar hx

Internally this may only be an embedding, but from the outside it's a true isomorphism. Opacity ensures:

$$ \forall x. \kern3pt \exists y. \kern3pt x = \mathrm{embed} \kern3pt y $$

Hence embed and debed are also inverses in the other direction:

$$ \forall x. \kern3pt \mathrm{embed} \kern3pt \left(\mathrm{debed} \kern3pt x\right) = \mathrm{embed} \kern3pt \left( \mathrm{debed} \kern3pt \left(\mathrm{embed} \kern3pt y\right) \right) = \mathrm{embed} \kern3pt y = x $$

This approach generalises all previous approaches, and at a glance, it looks the simplest and cleanest of the lot. However, the dirt and difficulties are hiding in the directions of the embedding; the dark magicks demonstrated in previous sections will still be needed within. Only, now their form is less constrained and there's room for creativity⁠—⁠room to shoot yourself in the foot. Prefer the previous approaches when they suffice.

You might have noticed something else different⁠—⁠the fields of Foo and Bar are lazy? Indeed, they're allowed to be; only debed is required to be strict for soundness. In practice, however⁠—⁠if the newtype representation is to actually slim down your data⁠—⁠embed will re-encode its arguments with minimal runtime indirections, and hence minimal laziness. Consequently, the fields will almost always end up strict.

Keeping It Shallow

The previous section is all-encompassing, so what's left to explore? Well, there's an issue with how it handles recursive data types. To demonstrate, suppose we were implementing

data Fin (n :: Peano) where
  Zero ::             Fin (S n)
  Succ :: !(Fin n) -> Fin (S n)

as a newtype. Following the recipe mechanically, we would end up with

pattern Succ :: () => sn ~ S n => GADT.Fin n -> Fin sn
pattern Succ fp <- (debed -> GADT.Succ fp)
  where Succ fp =   embed  $ GADT.Succ fp

but that isn't even the right type. The naïve fix is to reuse embed and debed within:

pattern Succ :: () => sn ~ S n => Fin n -> Fin sn
pattern Succ fp <- (debed -> GADT.Succ (embed -> fp))
  where Succ fp =   embed  $ GADT.Succ (debed  $ fp)

But don't be taken in by the duality⁠—⁠that thing's digging holes just to fill them back in again. We should insist that our pattern matching be shallow. As it happens, there's a nice way to think about this.

Both the recursive GADT and its isomorphic2 opaque newtype implementation are fixpoints of the same highest-order base functor3 FinF. So alongside the usual isomorphism

inGADT  :: FinF GADT.Fin n ->      GADT.Fin n
outGADT ::      GADT.Fin n -> FinF GADT.Fin n

there's also

inNT  :: FinF Fin n ->      Fin n
outNT ::      Fin n -> FinF Fin n

and that's exactly what we need!

To demonstrate in full, let's implement

data Recursive a where
  Base :: C x   a => G x                -> Recursive a
  Rec  :: D x y a => H x -> Recursive y -> Recursive a

as an opaque newtype with pattern synonyms, given

in_ :: RecursiveF Recursive a ->            Recursive a
out ::            Recursive a -> RecursiveF Recursive a

with out . in_ = id, where RecursiveF is the highest-order base functor3 of Recursive.

{-#
LANGUAGE
  PatternSynonyms, ViewPatterns, RoleAnnotations, TypeFamilyDependencies
#-}

module Recursive (Recursive(Base,Rec)) where

import Data.Kind (Constraint)

-- highest-order base functor (functorial in `rec`, not `a`)
data RecursiveF rec a where
  BaseF :: C x   a => G x          -> RecursiveF rec a
  RecF  :: D x y a => H x -> rec y -> RecursiveF rec a

-- stand-ins
type family C x   a :: Constraint
type family D x y a :: Constraint
type family G x = r | r -> x
type family H x = r | r -> x
type family F a
in_ :: RecursiveF Recursive a ->            Recursive a
out ::            Recursive a -> RecursiveF Recursive a
in_    = undefined
out !x = undefined x
 -- ^ out should be strict

                 -- vvvvvvv nom
type role Recursive nominal
newtype   Recursive a = UnsafeRecursive{ extract :: F a }

{-# COMPLETE Base, Rec #-}
pattern Base :: () => C x   a => G x                -> Recursive a
pattern Rec  :: () => D x y a => H x -> Recursive y -> Recursive a

pattern Base gx    <- (out -> BaseF gx)
  where Base gx    =   in_  $ BaseF gx
pattern Rec  hx ry <- (out -> RecF  hx ry)
  where Rec  hx ry =   in_  $ RecF  hx ry

Unboxed GADTs

This already follows from the previous sections, but I think it's a fact worthy of note: the newtype encoding allows us to write unboxed GADTs!

UnliftedDatatypes, true to its name, just lets you declare data of kind UnliftedType i.e. TYPE (Boxed Unlifted). UnliftedNewtypes on the other hand is much more general, and should perhaps have been called UnboxedNewtypes; it lets you write newtypes over TYPEs of any RuntimeRep.

GADTs and unboxing are hence mutually exclusive in standard formulation, but allowed to coexist in newtype form. Since we haven't actually written out any embeddings yet, let's take this opportunity to flesh out a complete example: let's implement the GADT

data Number# :: forall rr. TYPE rr -> UnliftedType where
  Int#    :: Int#    -> Number# Int#
  Double# :: Double# -> Number# Double#

at kind forall rr. TYPE rr -> TYPE (SumRep [IntRep, DoubleRep]).

{-#
LANGUAGE
  ExplicitNamespaces, MagicHash, UnboxedSums, UnliftedNewtypes, LambdaCase,
  GADTs, DataKinds, PatternSynonyms, ViewPatterns, RoleAnnotations
#-}

module Number (Number#(Int#,Double#)) where

import GHC.Exts
  (TYPE, RuntimeRep(SumRep,IntRep,DoubleRep), Int#, Double#, type (~~))
import Data.Type.Equality ((:~~:)(..))
import Unsafe.Coerce (unsafeCoerce)

import qualified GADT

embed :: forall {rr} (n :: TYPE rr). GADT.Number# n ->      Number# n
debed :: forall {rr} (n :: TYPE rr).      Number# n -> GADT.Number# n
embed = \case
  GADT.Int#    x -> UnsafeNumber# (# x |   #)
  GADT.Double# y -> UnsafeNumber# (#   | y #)
debed (UnsafeNumber# s) = case s of
  (# x |   #) -> unsafeEquality @n @Int#    (GADT.Int#    x)
  (#   | y #) -> unsafeEquality @n @Double# (GADT.Double# y)

-- We need /kind-heterogeneous/ equality, both here and in our pattern types.
unsafeEquality
  :: forall {k} {l} {rr} (p :: k) (q :: l) (r :: TYPE rr)
  .  (p ~~ q => r) -> r
unsafeEquality r = case magic of HRefl -> r
 where
  magic :: p :~~: q
  magic = unsafeCoerce (HRefl @p)

type role Number# nominal
type      Number# :: forall rr. TYPE rr -> TYPE (SumRep [IntRep, DoubleRep])
newtype   Number# n = UnsafeNumber# (# Int# | Double# #)

{-# COMPLETE Int#, Double# #-}
pattern Int#
  :: forall {rr} {n :: TYPE rr}. () => n ~~ Int#    => Int#    -> Number# n
pattern Double#
  :: forall {rr} {n :: TYPE rr}. () => n ~~ Double# => Double# -> Number# n

pattern Int#    i <- (debed -> GADT.Int#    i)
  where Int#    i =   embed  $ GADT.Int#    i
pattern Double# x <- (debed -> GADT.Double# x)
  where Double# x =   embed  $ GADT.Double# x

Closing Notes

And I believe that covers everything!

Just remember that these newtypes aren't like regular correct-by-construction data types⁠—⁠you have to treat them like difficult logic and test them thoroughly: what should compile; what shouldn't; what should explode; etc.

Take care, and happy hacking!

Footnotes

Footnotes

  1. Not strictly true, as we'll see later. 2

  2. This argument is somewhat backwards, and only follows if we've already written embed and debed as in the previous section. However, even if we haven't, so long as we write in_ and out, opacity will again ensure that they're true inverses. That makes the newtype a genuine fixpoint and hence isomorphic to the GADT in question.

  3. Mechanically derived by replacing recursive occurrences of the type constructor with a type parameter. If the recursive data type has kind K, then its highest-order base functor has kind K -> K (where kind variable quantifiers are implicit, not captured in the metavariable K). 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment