thoughts about Unison type declarations

Here's an unsorted bag of thoughts about the ideas in this draft of Arya's. It's long (sorry) but do see in particular the few bullets at the end.

(1) What's the motivation for going beyond structural types?

If those were all we had, then in the following,

structural Color = Red | Green | Blue | Yellow
warm x = case x of Red -> True; Yellow -> True; _ -> False

structural Direction = North | South | East | West
longitudinal x = case x of North -> True; South -> True; _ -> False

... then warm and longitudinal might end up hashing to the same function. That seems kooky at first, but is it actually even bad? warm and longitudinal become two different names for the same function, just as any function can have multiple names.

Thinking about it, I guess it would be awkward, that when editing a function using warm, the UI wouldn't know whether which name to tell you.

I guess you could see that as a demonstration of either of the following.

(A) We need type identity to be able to include the intended meaning of the type, not just its structure.

(B) Part of the metadata that needs to accompany function definitions, along with, say, the function's name, is the name we should be preferring when rendering functions that it calls. i.e. there's an ABT-shaped tree of metadata helping interpret all the function calls being made.

Are we sure it's not actually (B)?

(2) Nominal types option 1 (identify by GUID chosen at codebase-add time)

It does seem a bit painful that accidents of history about which people added what type when, and whether they knew each other at the time, should be irreversibly committed into the identity of bits of the code for ever more.

I guess there could be a 'swap data types' refactoring, where one compatible nominal type is swapped out for another. OK, that changes the identity of lots of code at once, but many things people can do will do that.

I was toying with the idea of people being able to define witnesses for isomorphisms between identical types (i.e. a trivial function that maps from your Color type to my Color type), and saying that if it knows about such an isomorphism, then Unison treats the types as interchangeable. Then the type identity effectively becomes the equivalence class of isomorphisms. That's nice and open-world. Not very simple though.

(3) Nominal types option 2 (identify by hash of set of (constructor's hash and name))

This would introduce a dependence from Unison's (post-edit/add) semantics up into the world of names. I think that's too big a departure from the Unison design principle that names are metadata.

Also, saying that a type's identity is its constructors'names is not radically better than saying that a type's identity is just its own name. Either way we've lost the property that code identity is invariant under naming. And if anything changes to constructor names will be more frequent than to the overall type name.

(4) Opaque types

It's not clear whether you mean for the 'member functions' (excuse the terminology..) to be part of the type identity (presumably just their signatures). I'm hoping yes because I quite like this idea. It's saying that the type's identity is determined by things you can do with it, which has something a bit categorical about it. The algebraic types idea is similar (the same? why are these different?)

Or maybe you mean for opaque types to be an answer to 'data modularity'. I don't know whether I'm using the right term for that, but what I mean is the following.

I can specify the interface I require from a type, write code to that interface, and then later bind the code to a particular type that supports that interface.

I don't know what our plan is for data modularity for Unison, or if we even see it as desirable. Maybe the ability system is what gives it to us (if we're happy for all our interfaces to be of the 'you can do this then you can do that' sequential sort.)

Anyway, I'd like to hear a bit more about the opaque types idea, and whether it's aimed at type identity, modularity, or both.

FWIW I'd been thinking "OK maybe there are a characteristic set of functions that manipulate a type, which we can treat as that type's identity". I hadn't got as far as thinking that that characteristic set was exactly the type's interface.

I'd worry that actually changes to the details of the signatures of a type's interface would be quite common, and not necessarily a fundamental change of the type.

(5) algebraic types

I'd love to know more about this (==) type constructor. How dependent are we getting here?

Why are algebraic types needed? Why not use a nominal type?

As I said above in (4), maybe algebraic and opaque types are the same thing (depending on whether I understood opaque types right.)

Maybe you can view algebraic types as just another structural type. Let's write out a structural type in GADT syntax since it looks suggestive.

-- structural List a = Nil | Cons a (List a)

data List a where
  Nil : List a
  Cons : a -> List a -> List a

You could add some operations and laws and treat them as all part of the package, like constructors.

data List a where
  Nil : List a
  Cons : a -> List a -> List a
  map : (a -> b) -> List a -> List b
  mapNil : (f : a -> b) -> Nil -> f Nil == Nil
  mapCons : (f : a -> b) -> (hd : a) -> (tl : List a) -> f (Cons hd tl) == Cons (f hd) (map f tl)

In this way of looking at things, what we were calling structural types are just the special case where all the 'member functions' of the type are constructors.

Taking this 'it's all one thing' impulse to an extreme, you could view nominal types as sugar for

data Color where
  Red : Color
  Green : Color
  Blue : Color
  %guid : () -- a value of type () whose definition has unique identity created at add-time

OK I'm not sure whether I'm just playing with syntax here or actually saying something interesting, but I've written it up again as a 'here's how it works' to see if it's an interesting story, see (8) below.

(6) is this core semantics or just edit-time?

We should be suspicious of inventing multiple types of type. Why isn't there a single natural approach that works? Haskell's take on data types (well, basic ADTs anyway) is simple and general. Is it really right that aspects of Unison's approach change something as fundamental as 'what sort of stuff is data'?

Should we think of the nominal/structural distinction as something that is actually part of the code, or is it rather a directive about how the code is added to the codebase? That then makes me feel better - there's just one kind of data at the level of core semantics; what we're talking about here is how we find and edit code.

(7) misc

Do we want a type's identity to change when we add a new constructor to it? And do we want that edit to open up all the definitions that reference that type? I guess one answer would be 'yes, so that the user can add extra cases for the new constructor to their pattern matches'. But that can't be the whole answer since some languages have an exhaustiveness checker for pattern matches (do we want one?)
Can we migrate code from one variety of this menu of types to another?

(8) Another way of looking at it

Here's an alternative telling of the datatype story... (trying to use a nice regular syntax for this but there might be sugar). I'm not saying this is any kind of leap forward from Arya's version, or that it's any better thought out - just that it's an interesting spin on it.

In Unison we define datatypes as follows.

data Person where
  Person : Name -> Age -> Person

The first line data Person where tells Unison that a type is being defined. The subsequent lines associate a number of functions with that type. Collectively we call these function's the type's 'associated functions'.

Strictly speaking, the line Person : Name -> Age -> Person adds two associated functions to Person: the constructor, of type Name -> Age -> Person; and another, the eliminator, of type Person -> Name -> Age, used when pattern matching on the type.

In Unison, just as functions are identified by a hash of their definition (after stripping out metadata like names), so too datatypes are identified by a hash of the definition of their associated functions.

Note that it's the definition that is hashed, not just the type signature.

Unison treats our type above as identical to the following!

data Wine where
  Wine : Name -> Age -> Wine

This is sometimes not what you want - it means you'd need to help Unison decide whether to display uses of this type as Wine or Person, when it should be obvious from the meaning of the user code.

You can fix this by making the type a 'nominal' type, as follows:

data Wine where
  Wine : Name -> Age -> Wine
  %guid : ()

This adds a special associated function with a definition that is set when you add the type to the codebase, and chosen to be globally unique. That's enough to give Wine its own identity, separate from other types which look the same.

Sometimes data types can only mean one thing, in which case no %guid is needed - for example:

data Maybe a where
  Nothing : Maybe a
  Just : a -> Maybe a

This has the benefit that even where different people introduce their own definitions of this type at different times and even with different names, their code will be interoperable.

Sometimes, non-constructor associated functions are enough to pin down what a type means.

data NumberPrefixMap a where
  Leaf : a
  Node : List (Nat, NumberPrefixMap a)
  lookup : List (Nat) -> a

... is different from ...

data RoseTree a where
  Leaf : a
  -- nodes are labelled with the sizes of their subtrees
  Node : List (Nat, RoseTree a)
  size : RoseTree a -> Nat

In these examples, what otherwise might have just been free-standing functions, like lookup ans size are associated with the type in order to disambiguate them.

.........

Reasons to be sceptical:

as presented, the choice of what functions are associated with the type is arbitrary, and just comes down to 'what I think might disambiguate this type from other types that I don't even know about'
I struggled for an example where associated functions were necessary and sufficient to disambiguate types. And why is this even better than %guid?

Current suspicions

My feeling after all that is that

we should view nominal and structural as the same kind of thing in terms of core language semantics
the difference is a kind of codebase editing directive about how the type is added to the codebase
we need a good story for reconciling different nominal types that ought to be the same - is this just a type of refactoring like any other?
algebraic types are not needed - just use nominal
adding extra characteristic functions to a types identity is a red herring (I'm not even sure if anyone was suggesting that other than me)
we ought to have some story for data modularity.

atacratic/type-decl-thoughts.md