I think many people know about Data.Some.Newtype
—it uses quite a cute (and cursed) trick to encode a simple existential wrapper as a newtype
, hence avoiding an unwanted indirection in the runtime representation.
But there's more to data
than existentials—so how far do these tricks go?
The answer is: surprisingly far!
Let's see what other GADTs we can slim down.
Before jumping into the deep end, there are a few things to consider and be aware of.
Setting GADTs aside for now, which ADTs can be made newtype
s?
The basic restrictions are, of course: one constructor1; one field1.
The less immediate issue is that of laziness—a newtype
has no runtime barrier against the evaluation of its contents, so it can't truly implement lazy data
.
Then, since newtype
s aren't able to be lazy, are they automatically strict?
Actually, no.
Since newtype
constructors don't exist at runtime, pattern matching on one is no different from pattern matching on a bare variable; it doesn't force anything.
To properly emulate strict data
, we must replace the newtype
constructor with a strict pattern synonym:
{-# LANGUAGE PatternSynonyms #-}
module Strict (Strict(Strict)) where
newtype Strict a = NonStrict a
{-# COMPLETE Strict #-}
pattern Strict :: a -> Strict a
pattern Strict x <- NonStrict !x
where Strict x = NonStrict x
The specialty of GADTs is to carry around not just regular data, but also type equality and class constraints. They're designed such that these constraints are only exposed by pattern matching on legitimately constructed data—this is called soundness. When you write a real GADT you don't need to worry about it, but working magic requires great care.
Suppose, for example, we wanted to implement the following as a newtype:
data Diag f a b where
Diag :: !(f a a) -> Diag f a a
A Diag f a b
witnesses that a ~ b
:
diagCoerce :: Diag f a b -> a -> b
diagCoerce (Diag _) x = x
While diagCoerce
itself is perfectly valid, if you can supply it a bogus premise then it can give back a bogus result.
Consider:
unsafeCoerce :: forall a b. a -> b
unsafeCoerce = diagCoerce (undefined :: Diag (,) a b)
The true GADT maintains soundness through an honourable death—it would rather explode than hand out bad constraints.
Our newtype
s must be the same—ensure they're appropriately strict!
Alright, with that out of the way, let's go over the OG.
The key idea is simple enough.
The type information you want to discard, just unsafeCoerce
it to the safe target GHC.Exts
obligingly provides: Any
.
{-# LANGUAGE PatternSynonyms #-}
module Some (Some(Some)) where
import GHC.Exts (Any)
import Unsafe.Coerce (unsafeCoerce)
newtype Some f = UnsafeSome (f Any)
{-# COMPLETE Some #-}
pattern Some :: f x -> Some f
pattern Some fx <- UnsafeSome fx
where Some fx = UnsafeSome (unsafeCoerce fx)
It might seem strange that the <-
direction type-checks (and I'm not 100% clear on the details myself) but it works because x
unifies with Any
and isn't bound by the outer universal scope of the pattern type, but its inner existential scope:
pattern Some
:: forall f. ()
=> forall x. ()
=> f x -> Some f
Some
as written above or in Data.Newtype.Some
isn't precisely equivalent to either
data Some f where
Some :: f x -> Some f
or
data Some f where
Some :: !(f x) -> Some f
since the pattern synonym isn't strict, but this is allowable so long as users are aware—there's no issue of soundness.
Equality constraints are the core feature of GADTs—we're lucky that they're well within reach.
The idea is that if we impose an equality constraint upon construction of the data, then we know it's safe to magic up the same when we take it apart.
For the magic itself, it suffices to wield unsafeCoerce
against Refl
, then ViewPatterns
come to the rescue.
To demonstrate, let's implement
data Equality a where
Equality :: L a ~ R a => !(F a) -> Equality a
as a newtype
, where L
, R
and F
are stand-ins for whatever pleases you.
{-#
LANGUAGE
PatternSynonyms, ViewPatterns, RoleAnnotations, UnboxedTuples, TypeFamilies
#-}
module Equality (Equality(Equality)) where
import Data.Type.Equality ((:~:)(..))
import Unsafe.Coerce (unsafeCoerce)
-- stand-ins
type family L a
type family R a
type family F a
-- A type parameter that is involved in an equality must be declared nominal!
-- Otherwise, GHC may infer a weaker role and allow it to be `coerce`d to a
-- type that does not satisfy your equality constraint.
type role Equality nominal
newtype Equality a = UnsafeEquality{ extract :: F a }
view :: forall a. Equality a -> (# L a :~: R a, F a #)
view !eq = (# unsafeCoerce (Refl @(L a)), extract eq #)
-- ^-- This bang is necessary for soundness!
{-# COMPLETE Equality #-}
pattern Equality :: () => L a ~ R a => F a -> Equality a
pattern Equality x <- (view -> (# Refl, x #))
where Equality x = UnsafeEquality x
The same approach as for equality constraints applies to empty classes, since we can magic them up in much the same way. Again, let's implement
data HasProp a where
HasProp :: Prop a => !(F a) -> HasProp a
as a newtype
where Prop
and F
are stand-ins for an empty class and some data.
{-#
LANGUAGE
PatternSynonyms, ViewPatterns, RoleAnnotations, UnboxedTuples, TypeFamilies
#-}
module HasProp (HasProp(HasProp)) where
import Unsafe.Coerce (unsafeCoerce)
-- stand-ins
class Prop a
type family F a
-- A type parameter that is involved in a class constraint must also be
-- declared nominal, for much the same reason.
type role HasProp nominal
newtype HasProp a = UnsafeHasProp{ extract :: F a }
-- A data type to help us wield `unsafeCoerce` against constraints.
data Dict c = c => Dict
view :: HasProp a -> (# Dict (Prop a), F a #)
view !x = (# unsafeCoerce (Dict @()), extract x #)
-- ^ Again, necessary for soundness!
{-# COMPLETE HasProp #-}
pattern HasProp :: () => Prop a => F a -> HasProp a
pattern HasProp x <- (view -> (# Dict, x #))
where HasProp x = UnsafeHasProp x
The instance we want to magic up does need to be trivial in some sense for our "magic" to work, but it does not strictly need to be empty.
So long as we can compute the instance dictionary from the data we have at hand, it's possible to take things a step further using withDict
.
To demonstrate, we'll implement
data Computable a where
Computable :: C a => !(G a) -> Computable a
as an opaque newtype with a pattern synonym, where
class C a where
method :: H a
given
computeForwards :: (# H a, G a #) -> Computable a
computeBackwards :: Computable a -> (# H a, G a #)
with computeBackwards . computeForwards = id
.
{-#
LANGUAGE
PatternSynonyms, ViewPatterns, RoleAnnotations, UnboxedTuples,
TypeFamilies, AllowAmbiguousTypes
#-}
module Computable (Computable(Computable)) where
import GHC.Exts (withDict)
-- stand-ins
type family F a
type family G a
type family H a
class C a where
method :: H a
computeForwards :: (# H a, G a #) -> Computable a
computeBackwards :: Computable a -> (# H a, G a #)
computeForwards = undefined
computeBackwards = undefined
-- Declare nominal, yada yada.
type role Computable nominal
newtype Computable a = UnsafeComputable{ extract :: F a }
computable :: forall a. C a => G a -> Computable a
computable x = computeForwards (# method @a, x #)
data Dict c = c => Dict
view :: forall a. Computable a -> (# Dict (C a), G a #)
view !x = case computeBackwards x of
-- ^ Bang, yada yada.
(# gx, fx #) -> (# withDict @(C a) gx Dict, fx #)
{-# COMPLETE Computable #-}
pattern Computable :: () => C a => G a -> Computable a
pattern Computable x <- (view -> (# Dict, x #))
where Computable x = computable x
This is essentially what SNat
, SSymbol
and SChar
all do, except that they have the added complication of needing to invent type indices with rank-2 types, e.g.
{-# NOINLINE withSomeSNat #-}
withSomeSNat :: Natural -> (forall n. SNat n -> r) -> r
withSomeSNat n k = k (UnsafeSNat n)
If you're doing the same, ensure the function that introduces the indices is NOINLINE
d!
See Note [NOINLINE withSomeSNat]
for details.
In the previous section, we used
computeForwards :: (# H a, G a #) -> Computable a
computeBackwards :: Computable a -> (# H a, G a #)
in construction and matching respectively.
You might already be asking yourself: so long as we embed it reversibly in Computable a
, couldn't we use any type we wanted?
And indeed, you would not be wrong—we can even support sums!
To demonstrate, we'll implement
data Embedding a where
Foo :: C x a => G x -> Embedding a
Bar :: D x a => H x -> Embedding a
as an opaque newtype with pattern synonyms, given
embed :: GADT.Embedding a -> Embedding a
debed :: Embedding a -> GADT.Embedding a
with debed . embed = id
, where module GADT
exports the standard form above.
{-# LANGUAGE PatternSynonyms, ViewPatterns, RoleAnnotations, TypeFamilies #-}
module Embedding (Embedding(Foo,Bar)) where
import qualified GADT
-- stand-ins
import GADT (C, D, G, H)
type family F a
embed :: GADT.Embedding a -> Embedding a
debed :: Embedding a -> GADT.Embedding a
embed = undefined
debed !x = undefined x
-- ^ debed should be strict
-- vvvvvvv nom
type role Embedding nominal
newtype Embedding a = UnsafeEmbedding{ extract :: F a }
{-# COMPLETE Foo, Bar #-}
pattern Foo :: () => C x a => G x -> Embedding a
pattern Bar :: () => D x a => H x -> Embedding a
pattern Foo gx <- (debed -> GADT.Foo gx)
where Foo gx = embed $ GADT.Foo gx
pattern Bar hx <- (debed -> GADT.Bar hx)
where Bar hx = embed $ GADT.Bar hx
Internally this may only be an embedding, but from the outside it's a true isomorphism. Opacity ensures:
Hence embed
and debed
are also inverses in the other direction:
This approach generalises all previous approaches, and at a glance, it looks the simplest and cleanest of the lot. However, the dirt and difficulties are hiding in the directions of the embedding; the dark magicks demonstrated in previous sections will still be needed within. Only, now their form is less constrained and there's room for creativity—room to shoot yourself in the foot. Prefer the previous approaches when they suffice.
You might have noticed something else different—the fields of Foo
and Bar
are lazy?
Indeed, they're allowed to be; only debed
is required to be strict for soundness.
In practice, however—if the newtype
representation is to actually slim down your data
—embed
will re-encode its arguments with minimal runtime indirections, and hence minimal laziness.
Consequently, the fields will almost always end up strict.
The previous section is all-encompassing, so what's left to explore? Well, there's an issue with how it handles recursive data types. To demonstrate, suppose we were implementing
data Fin (n :: Peano) where
Zero :: Fin (S n)
Succ :: !(Fin n) -> Fin (S n)
as a newtype
.
Following the recipe mechanically, we would end up with
pattern Succ :: () => sn ~ S n => GADT.Fin n -> Fin sn
pattern Succ fp <- (debed -> GADT.Succ fp)
where Succ fp = embed $ GADT.Succ fp
but that isn't even the right type.
The naïve fix is to reuse embed
and debed
within:
pattern Succ :: () => sn ~ S n => Fin n -> Fin sn
pattern Succ fp <- (debed -> GADT.Succ (embed -> fp))
where Succ fp = embed $ GADT.Succ (debed $ fp)
But don't be taken in by the duality—that thing's digging holes just to fill them back in again. We should insist that our pattern matching be shallow. As it happens, there's a nice way to think about this.
Both the recursive GADT and its isomorphic2 opaque newtype
implementation are fixpoints of the same highest-order base functor3 FinF
.
So alongside the usual isomorphism
inGADT :: FinF GADT.Fin n -> GADT.Fin n
outGADT :: GADT.Fin n -> FinF GADT.Fin n
there's also
inNT :: FinF Fin n -> Fin n
outNT :: Fin n -> FinF Fin n
and that's exactly what we need!
To demonstrate in full, let's implement
data Recursive a where
Base :: C x a => G x -> Recursive a
Rec :: D x y a => H x -> Recursive y -> Recursive a
as an opaque newtype with pattern synonyms, given
in_ :: RecursiveF Recursive a -> Recursive a
out :: Recursive a -> RecursiveF Recursive a
with out . in_ = id
, where RecursiveF
is the highest-order base functor3 of Recursive
.
{-#
LANGUAGE
PatternSynonyms, ViewPatterns, RoleAnnotations, TypeFamilyDependencies
#-}
module Recursive (Recursive(Base,Rec)) where
import Data.Kind (Constraint)
-- highest-order base functor (functorial in `rec`, not `a`)
data RecursiveF rec a where
BaseF :: C x a => G x -> RecursiveF rec a
RecF :: D x y a => H x -> rec y -> RecursiveF rec a
-- stand-ins
type family C x a :: Constraint
type family D x y a :: Constraint
type family G x = r | r -> x
type family H x = r | r -> x
type family F a
in_ :: RecursiveF Recursive a -> Recursive a
out :: Recursive a -> RecursiveF Recursive a
in_ = undefined
out !x = undefined x
-- ^ out should be strict
-- vvvvvvv nom
type role Recursive nominal
newtype Recursive a = UnsafeRecursive{ extract :: F a }
{-# COMPLETE Base, Rec #-}
pattern Base :: () => C x a => G x -> Recursive a
pattern Rec :: () => D x y a => H x -> Recursive y -> Recursive a
pattern Base gx <- (out -> BaseF gx)
where Base gx = in_ $ BaseF gx
pattern Rec hx ry <- (out -> RecF hx ry)
where Rec hx ry = in_ $ RecF hx ry
This already follows from the previous sections, but I think it's a fact worthy of note: the newtype
encoding allows us to write unboxed GADTs!
UnliftedDatatypes
, true to its name, just lets you declare data
of kind UnliftedType
i.e. TYPE (Boxed Unlifted)
.
UnliftedNewtypes
on the other hand is much more general, and should perhaps have been called UnboxedNewtypes
; it lets you write newtype
s over TYPE
s of any RuntimeRep
.
GADTs and unboxing are hence mutually exclusive in standard formulation, but allowed to coexist in newtype
form.
Since we haven't actually written out any embeddings yet, let's take this opportunity to flesh out a complete example: let's implement the GADT
data Number# :: forall rr. TYPE rr -> UnliftedType where
Int# :: Int# -> Number# Int#
Double# :: Double# -> Number# Double#
at kind forall rr. TYPE rr -> TYPE (SumRep [IntRep, DoubleRep])
.
{-#
LANGUAGE
ExplicitNamespaces, MagicHash, UnboxedSums, UnliftedNewtypes, LambdaCase,
GADTs, DataKinds, PatternSynonyms, ViewPatterns, RoleAnnotations
#-}
module Number (Number#(Int#,Double#)) where
import GHC.Exts
(TYPE, RuntimeRep(SumRep,IntRep,DoubleRep), Int#, Double#, type (~~))
import Data.Type.Equality ((:~~:)(..))
import Unsafe.Coerce (unsafeCoerce)
import qualified GADT
embed :: forall {rr} (n :: TYPE rr). GADT.Number# n -> Number# n
debed :: forall {rr} (n :: TYPE rr). Number# n -> GADT.Number# n
embed = \case
GADT.Int# x -> UnsafeNumber# (# x | #)
GADT.Double# y -> UnsafeNumber# (# | y #)
debed (UnsafeNumber# s) = case s of
(# x | #) -> unsafeEquality @n @Int# (GADT.Int# x)
(# | y #) -> unsafeEquality @n @Double# (GADT.Double# y)
-- We need /kind-heterogeneous/ equality, both here and in our pattern types.
unsafeEquality
:: forall {k} {l} {rr} (p :: k) (q :: l) (r :: TYPE rr)
. (p ~~ q => r) -> r
unsafeEquality r = case magic of HRefl -> r
where
magic :: p :~~: q
magic = unsafeCoerce (HRefl @p)
type role Number# nominal
type Number# :: forall rr. TYPE rr -> TYPE (SumRep [IntRep, DoubleRep])
newtype Number# n = UnsafeNumber# (# Int# | Double# #)
{-# COMPLETE Int#, Double# #-}
pattern Int#
:: forall {rr} {n :: TYPE rr}. () => n ~~ Int# => Int# -> Number# n
pattern Double#
:: forall {rr} {n :: TYPE rr}. () => n ~~ Double# => Double# -> Number# n
pattern Int# i <- (debed -> GADT.Int# i)
where Int# i = embed $ GADT.Int# i
pattern Double# x <- (debed -> GADT.Double# x)
where Double# x = embed $ GADT.Double# x
And I believe that covers everything!
Just remember that these newtype
s aren't like regular correct-by-construction data
types—you have to treat them like difficult logic and test them thoroughly: what should compile; what shouldn't; what should explode; etc.
Take care, and happy hacking!
Footnotes
-
This argument is somewhat backwards, and only follows if we've already written
embed
anddebed
as in the previous section. However, even if we haven't, so long as we writein_
andout
, opacity will again ensure that they're true inverses. That makes thenewtype
a genuine fixpoint and hence isomorphic to the GADT in question. ↩ -
Mechanically derived by replacing recursive occurrences of the type constructor with a type parameter. If the recursive data type has kind
K
, then its highest-order base functor has kindK -> K
(where kind variable quantifiers are implicit, not captured in the metavariableK
). ↩ ↩2