I think many people know about Data.Some.Newtype—it uses quite a cute (and cursed) trick to encode a simple existential wrapper as a newtype, hence avoiding an unwanted indirection in the runtime representation.
But there's more to data than existentials—so how far do these tricks go?
The answer is: surprisingly far!
Let's see what other GADTs we can slim down.
Before jumping into the deep end, there are a few things to consider and be aware of.
Setting GADTs aside for now, which ADTs can be made newtypes?
The basic restrictions are, of course: one constructor1; one field1.
The less immediate issue is that of laziness—a newtype has no runtime barrier against the evaluation of its contents, so it can't truly implement lazy data.
Then, since newtypes aren't able to be lazy, are they automatically strict?
Actually, no.
Since newtype constructors don't exist at runtime, pattern matching on one is no different from pattern matching on a bare variable; it doesn't force anything.
To properly emulate strict data, we must replace the newtype constructor with a strict pattern synonym:
{-# LANGUAGE PatternSynonyms #-}
module Strict (Strict(Strict)) where
newtype Strict a = NonStrict a
{-# COMPLETE Strict #-}
pattern Strict :: a -> Strict a
pattern Strict x <- NonStrict !x
where Strict x = NonStrict xThe specialty of GADTs is to carry around not just regular data, but also type equality and class constraints. They're designed such that these constraints are only exposed by pattern matching on legitimately constructed data—this is called soundness. When you write a real GADT you don't need to worry about it, but working magic requires great care.
Suppose, for example, we wanted to implement the following as a newtype:
data Diag f a b where
Diag :: !(f a a) -> Diag f a aA Diag f a b witnesses that a ~ b:
diagCoerce :: Diag f a b -> a -> b
diagCoerce (Diag _) x = xWhile diagCoerce itself is perfectly valid, if you can supply it a bogus premise then it can give back a bogus result.
Consider:
unsafeCoerce :: forall a b. a -> b
unsafeCoerce = diagCoerce (undefined :: Diag (,) a b)The true GADT maintains soundness through an honourable death—it would rather explode than hand out bad constraints.
Our newtypes must be the same—ensure they're appropriately strict!
Alright, with that out of the way, let's go over the OG.
The key idea is simple enough.
The type information you want to discard, just unsafeCoerce it to the safe target GHC.Exts obligingly provides: Any.
{-# LANGUAGE PatternSynonyms #-}
module Some (Some(Some)) where
import GHC.Exts (Any)
import Unsafe.Coerce (unsafeCoerce)
newtype Some f = UnsafeSome (f Any)
{-# COMPLETE Some #-}
pattern Some :: f x -> Some f
pattern Some fx <- UnsafeSome fx
where Some fx = UnsafeSome (unsafeCoerce fx)It might seem strange that the <- direction type-checks (and I'm not 100% clear on the details myself) but it works because x unifies with Any and isn't bound by the outer universal scope of the pattern type, but its inner existential scope:
pattern Some
:: forall f. ()
=> forall x. ()
=> f x -> Some fSome as written above or in Data.Newtype.Some isn't precisely equivalent to either
data Some f where
Some :: f x -> Some for
data Some f where
Some :: !(f x) -> Some fsince the pattern synonym isn't strict, but this is allowable so long as users are aware—there's no issue of soundness.
Equality constraints are the core feature of GADTs—we're lucky that they're well within reach.
The idea is that if we impose an equality constraint upon construction of the data, then we know it's safe to magic up the same when we take it apart.
For the magic itself, it suffices to wield unsafeCoerce against Refl, then ViewPatterns come to the rescue.
To demonstrate, let's implement
data Equality a where
Equality :: L a ~ R a => !(F a) -> Equality aas a newtype, where L, R and F are stand-ins for whatever pleases you.
{-#
LANGUAGE
PatternSynonyms, ViewPatterns, RoleAnnotations, UnboxedTuples, TypeFamilies
#-}
module Equality (Equality(Equality)) where
import Data.Type.Equality ((:~:)(..))
import Unsafe.Coerce (unsafeCoerce)
-- stand-ins
type family L a
type family R a
type family F a
-- A type parameter that is involved in an equality must be declared nominal!
-- Otherwise, GHC may infer a weaker role and allow it to be `coerce`d to a
-- type that does not satisfy your equality constraint.
type role Equality nominal
newtype Equality a = UnsafeEquality{ extract :: F a }
view :: forall a. Equality a -> (# L a :~: R a, F a #)
view !eq = (# unsafeCoerce (Refl @(L a)), extract eq #)
-- ^-- This bang is necessary for soundness!
{-# COMPLETE Equality #-}
pattern Equality :: () => L a ~ R a => F a -> Equality a
pattern Equality x <- (view -> (# Refl, x #))
where Equality x = UnsafeEquality xThe same approach as for equality constraints applies to empty classes, since we can magic them up in much the same way. Again, let's implement
data HasProp a where
HasProp :: Prop a => !(F a) -> HasProp aas a newtype where Prop and F are stand-ins for an empty class and some data.
{-#
LANGUAGE
PatternSynonyms, ViewPatterns, RoleAnnotations, UnboxedTuples, TypeFamilies
#-}
module HasProp (HasProp(HasProp)) where
import Unsafe.Coerce (unsafeCoerce)
-- stand-ins
class Prop a
type family F a
-- A type parameter that is involved in a class constraint must also be
-- declared nominal, for much the same reason.
type role HasProp nominal
newtype HasProp a = UnsafeHasProp{ extract :: F a }
-- A data type to help us wield `unsafeCoerce` against constraints.
data Dict c = c => Dict
view :: HasProp a -> (# Dict (Prop a), F a #)
view !x = (# unsafeCoerce (Dict @()), extract x #)
-- ^ Again, necessary for soundness!
{-# COMPLETE HasProp #-}
pattern HasProp :: () => Prop a => F a -> HasProp a
pattern HasProp x <- (view -> (# Dict, x #))
where HasProp x = UnsafeHasProp xThe instance we want to magic up does need to be trivial in some sense for our "magic" to work, but it does not strictly need to be empty.
So long as we can compute the instance dictionary from the data we have at hand, it's possible to take things a step further using withDict.
To demonstrate, we'll implement
data Computable a where
Computable :: C a => !(G a) -> Computable aas an opaque newtype with a pattern synonym, where
class C a where
method :: H agiven
computeForwards :: (# H a, G a #) -> Computable a
computeBackwards :: Computable a -> (# H a, G a #)with computeBackwards . computeForwards = id.
{-#
LANGUAGE
PatternSynonyms, ViewPatterns, RoleAnnotations, UnboxedTuples,
TypeFamilies, AllowAmbiguousTypes
#-}
module Computable (Computable(Computable)) where
import GHC.Exts (withDict)
-- stand-ins
type family F a
type family G a
type family H a
class C a where
method :: H a
computeForwards :: (# H a, G a #) -> Computable a
computeBackwards :: Computable a -> (# H a, G a #)
computeForwards = undefined
computeBackwards = undefined
-- Declare nominal, yada yada.
type role Computable nominal
newtype Computable a = UnsafeComputable{ extract :: F a }
computable :: forall a. C a => G a -> Computable a
computable x = computeForwards (# method @a, x #)
data Dict c = c => Dict
view :: forall a. Computable a -> (# Dict (C a), G a #)
view !x = case computeBackwards x of
-- ^ Bang, yada yada.
(# gx, fx #) -> (# withDict @(C a) gx Dict, fx #)
{-# COMPLETE Computable #-}
pattern Computable :: () => C a => G a -> Computable a
pattern Computable x <- (view -> (# Dict, x #))
where Computable x = computable xThis is essentially what SNat, SSymbol and SChar all do, except that they have the added complication of needing to invent type indices with rank-2 types, e.g.
{-# NOINLINE withSomeSNat #-}
withSomeSNat :: Natural -> (forall n. SNat n -> r) -> r
withSomeSNat n k = k (UnsafeSNat n)If you're doing the same, ensure the function that introduces the indices is NOINLINEd!
See Note [NOINLINE withSomeSNat] for details.
In the previous section, we used
computeForwards :: (# H a, G a #) -> Computable a
computeBackwards :: Computable a -> (# H a, G a #)in construction and matching respectively.
You might already be asking yourself: so long as we embed it reversibly in Computable a, couldn't we use any type we wanted?
And indeed, you would not be wrong—we can even support sums!
To demonstrate, we'll implement
data Embedding a where
Foo :: C x a => G x -> Embedding a
Bar :: D x a => H x -> Embedding aas an opaque newtype with pattern synonyms, given
embed :: GADT.Embedding a -> Embedding a
debed :: Embedding a -> GADT.Embedding awith debed . embed = id, where module GADT exports the standard form above.
{-# LANGUAGE PatternSynonyms, ViewPatterns, RoleAnnotations, TypeFamilies #-}
module Embedding (Embedding(Foo,Bar)) where
import qualified GADT
-- stand-ins
import GADT (C, D, G, H)
type family F a
embed :: GADT.Embedding a -> Embedding a
debed :: Embedding a -> GADT.Embedding a
embed = undefined
debed !x = undefined x
-- ^ debed should be strict
-- vvvvvvv nom
type role Embedding nominal
newtype Embedding a = UnsafeEmbedding{ extract :: F a }
{-# COMPLETE Foo, Bar #-}
pattern Foo :: () => C x a => G x -> Embedding a
pattern Bar :: () => D x a => H x -> Embedding a
pattern Foo gx <- (debed -> GADT.Foo gx)
where Foo gx = embed $ GADT.Foo gx
pattern Bar hx <- (debed -> GADT.Bar hx)
where Bar hx = embed $ GADT.Bar hxInternally this may only be an embedding, but from the outside it's a true isomorphism. Opacity ensures:
Hence embed and debed are also inverses in the other direction:
This approach generalises all previous approaches, and at a glance, it looks the simplest and cleanest of the lot. However, the dirt and difficulties are hiding in the directions of the embedding; the dark magicks demonstrated in previous sections will still be needed within. Only, now their form is less constrained and there's room for creativity—room to shoot yourself in the foot. Prefer the previous approaches when they suffice.
You might have noticed something else different—the fields of Foo and Bar are lazy?
Indeed, they're allowed to be; only debed is required to be strict for soundness.
In practice, however—if the newtype representation is to actually slim down your data—embed will re-encode its arguments with minimal runtime indirections, and hence minimal laziness.
Consequently, the fields will almost always end up strict.
The previous section is all-encompassing, so what's left to explore? Well, there's an issue with how it handles recursive data types. To demonstrate, suppose we were implementing
data Fin (n :: Peano) where
Zero :: Fin (S n)
Succ :: !(Fin n) -> Fin (S n)as a newtype.
Following the recipe mechanically, we would end up with
pattern Succ :: () => sn ~ S n => GADT.Fin n -> Fin sn
pattern Succ fp <- (debed -> GADT.Succ fp)
where Succ fp = embed $ GADT.Succ fpbut that isn't even the right type.
The naïve fix is to reuse embed and debed within:
pattern Succ :: () => sn ~ S n => Fin n -> Fin sn
pattern Succ fp <- (debed -> GADT.Succ (embed -> fp))
where Succ fp = embed $ GADT.Succ (debed $ fp)But don't be taken in by the duality—that thing's digging holes just to fill them back in again. We should insist that our pattern matching be shallow. As it happens, there's a nice way to think about this.
Both the recursive GADT and its isomorphic2 opaque newtype implementation are fixpoints of the same highest-order base functor3 FinF.
So alongside the usual isomorphism
inGADT :: FinF GADT.Fin n -> GADT.Fin n
outGADT :: GADT.Fin n -> FinF GADT.Fin nthere's also
inNT :: FinF Fin n -> Fin n
outNT :: Fin n -> FinF Fin nand that's exactly what we need!
To demonstrate in full, let's implement
data Recursive a where
Base :: C x a => G x -> Recursive a
Rec :: D x y a => H x -> Recursive y -> Recursive aas an opaque newtype with pattern synonyms, given
in_ :: RecursiveF Recursive a -> Recursive a
out :: Recursive a -> RecursiveF Recursive awith out . in_ = id, where RecursiveF is the highest-order base functor3 of Recursive.
{-#
LANGUAGE
PatternSynonyms, ViewPatterns, RoleAnnotations, TypeFamilyDependencies
#-}
module Recursive (Recursive(Base,Rec)) where
import Data.Kind (Constraint)
-- highest-order base functor (functorial in `rec`, not `a`)
data RecursiveF rec a where
BaseF :: C x a => G x -> RecursiveF rec a
RecF :: D x y a => H x -> rec y -> RecursiveF rec a
-- stand-ins
type family C x a :: Constraint
type family D x y a :: Constraint
type family G x = r | r -> x
type family H x = r | r -> x
type family F a
in_ :: RecursiveF Recursive a -> Recursive a
out :: Recursive a -> RecursiveF Recursive a
in_ = undefined
out !x = undefined x
-- ^ out should be strict
-- vvvvvvv nom
type role Recursive nominal
newtype Recursive a = UnsafeRecursive{ extract :: F a }
{-# COMPLETE Base, Rec #-}
pattern Base :: () => C x a => G x -> Recursive a
pattern Rec :: () => D x y a => H x -> Recursive y -> Recursive a
pattern Base gx <- (out -> BaseF gx)
where Base gx = in_ $ BaseF gx
pattern Rec hx ry <- (out -> RecF hx ry)
where Rec hx ry = in_ $ RecF hx ryThis already follows from the previous sections, but I think it's a fact worthy of note: the newtype encoding allows us to write unboxed GADTs!
UnliftedDatatypes, true to its name, just lets you declare data of kind UnliftedType i.e. TYPE (Boxed Unlifted).
UnliftedNewtypes on the other hand is much more general, and should perhaps have been called UnboxedNewtypes; it lets you write newtypes over TYPEs of any RuntimeRep.
GADTs and unboxing are hence mutually exclusive in standard formulation, but allowed to coexist in newtype form.
Since we haven't actually written out any embeddings yet, let's take this opportunity to flesh out a complete example: let's implement the GADT
data Number# :: forall rr. TYPE rr -> UnliftedType where
Int# :: Int# -> Number# Int#
Double# :: Double# -> Number# Double#at kind forall rr. TYPE rr -> TYPE (SumRep [IntRep, DoubleRep]).
{-#
LANGUAGE
ExplicitNamespaces, MagicHash, UnboxedSums, UnliftedNewtypes, LambdaCase,
GADTs, DataKinds, PatternSynonyms, ViewPatterns, RoleAnnotations
#-}
module Number (Number#(Int#,Double#)) where
import GHC.Exts
(TYPE, RuntimeRep(SumRep,IntRep,DoubleRep), Int#, Double#, type (~~))
import Data.Type.Equality ((:~~:)(..))
import Unsafe.Coerce (unsafeCoerce)
import qualified GADT
embed :: forall {rr} (n :: TYPE rr). GADT.Number# n -> Number# n
debed :: forall {rr} (n :: TYPE rr). Number# n -> GADT.Number# n
embed = \case
GADT.Int# x -> UnsafeNumber# (# x | #)
GADT.Double# y -> UnsafeNumber# (# | y #)
debed (UnsafeNumber# s) = case s of
(# x | #) -> unsafeEquality @n @Int# (GADT.Int# x)
(# | y #) -> unsafeEquality @n @Double# (GADT.Double# y)
-- We need /kind-heterogeneous/ equality, both here and in our pattern types.
unsafeEquality
:: forall {k} {l} {rr} (p :: k) (q :: l) (r :: TYPE rr)
. (p ~~ q => r) -> r
unsafeEquality r = case magic of HRefl -> r
where
magic :: p :~~: q
magic = unsafeCoerce (HRefl @p)
type role Number# nominal
type Number# :: forall rr. TYPE rr -> TYPE (SumRep [IntRep, DoubleRep])
newtype Number# n = UnsafeNumber# (# Int# | Double# #)
{-# COMPLETE Int#, Double# #-}
pattern Int#
:: forall {rr} {n :: TYPE rr}. () => n ~~ Int# => Int# -> Number# n
pattern Double#
:: forall {rr} {n :: TYPE rr}. () => n ~~ Double# => Double# -> Number# n
pattern Int# i <- (debed -> GADT.Int# i)
where Int# i = embed $ GADT.Int# i
pattern Double# x <- (debed -> GADT.Double# x)
where Double# x = embed $ GADT.Double# xAnd I believe that covers everything!
Just remember that these newtypes aren't like regular correct-by-construction data types—you have to treat them like difficult logic and test them thoroughly: what should compile; what shouldn't; what should explode; etc.
Take care, and happy hacking!
Footnotes
-
This argument is somewhat backwards, and only follows if we've already written
embedanddebedas in the previous section. However, even if we haven't, so long as we writein_andout, opacity will again ensure that they're true inverses. That makes thenewtypea genuine fixpoint and hence isomorphic to the GADT in question. ↩ -
Mechanically derived by replacing recursive occurrences of the type constructor with a type parameter. If the recursive data type has kind
K, then its highest-order base functor has kindK -> K(where kind variable quantifiers are implicit, not captured in the metavariableK). ↩ ↩2
Reddit thread: https://www.reddit.com/r/haskell/comments/1m1bxhx/gadts_that_can_be_newtypes_and_how_to_roll_em/
Discourse thread: https://discourse.haskell.org/t/gadts-that-can-be-newtypes-and-how-to-roll-em/12484