Based on Paul and Arya's design here, for a codebase editor for Unison.
Unison functions are content addressed - a function's identity (Id
) depends only on its content.
The 'content' here excludes cosmetic stuff that doesn't affect how the function runs, like names, comments and layout.
When one function has a call to another, it refers to it by Id
1.
When I want to talk about a function, I refer to it using a Name
. My name for the function might be different from yours though!
Unison functions are immutable (once they exist). The bindings between Names and functions are mutable.
Here's an example showing a function with name f
, which has a call to a function with name g
:
We want to explain how to 'change the code', in a world where functions are immutable! And we'd like our answer to allow concurrent editing by different people.
Unison function Id
s are like stars in the sky2. You can discover new ones, but the ones that are there don't change.
Name
s are what you write on your map of the sky. The stars themselves don't know or care :)
At runtime, all we have are Id
s (unless we're using the Codebase
ability). Name
s are only used at edit time, and when a person is selecting what function they want to run.
Id
s represent implementation; Name
s represent intention - your name for a function represents what you think it's good for, and what it means to you.
A branch is a Map Name Id
. It's a map of the sky. It's the thing you're actually changing, when you 'change the code'.
The process of making a change is managed using a Transaction
, a piece of state which evolves as the user goes through the editing process. A Transaction
is where work-in-progress lives until it the user decides it is a coherent change to the code. (It must also be free of conflicts at this point. Conflicts are competing candidate changes, possibly made by different people, which leave Unison unsure which Id
some Name
should map to.)
Say we have a branch B
and a function which that branch calls f
. It might feel like you're changing f
, when you open it for editing3, but actually Unison goes through the following process (a picture is below).
- Looking up the
Id
which theName
f
refers to in branchB
. Let's say theId
is the hash2nwu84
. - Fetching the function definition associated with that hash in the code store, and exposing it to the user so they can play with it - maybe by pretty-printing it to a file and opening that file in an editor.
- Once the user has come back with their new function definition, slurping that definition into the code store and working out its new
Id
, say the hashs9b22k
. - Creating a
Transaction
t
, which contains the following proposed changes.- "Instead of binding the name
f
to the hash2nwu84
, bind it tos9b22k
." - "Consider
s9b22k
a candidate replacement for2nwu84
. ("Adds9b22k
to the 'edited set' for2nwu84
.)
- "Instead of binding the name
- Work through the possible consequences of this candidate replacement as follows.
- If other functions in the code store reference hash
2nwu84
, ask the user whether they want to repeat this process on those functions. This includes deciding whetherName
s that refer to those functions should be rebound, to point to the new versions. - A typical default action would be to propagate out the change, so that
f
's dependent functions (callers off
, for exampleg
in the diagram) are swapped out of whatever name bindings they're in, and replaced with functions that call the newf
. And then continue propagating out the resulting changes transitively.
- If other functions in the code store reference hash
- Once the user has decided how far they want to propagate the change, we have a
Transaction
which proposes a bunch of name rebindings. We can treat it as a functiont : Branch -> Branch
, and commit the transaction by replacing our working BranchB
with the new brancht B
. Note that the transaction also contained a bunch of accumulated information about candidate replacements (edited sets) - but we don't need that any more, and it doesn't directly affect the branch. It was just working data to allow Unison to systematically propose propagation steps to the user during the editing process.
During the process, Unison is guided by the 'edited sets', i.e. by a Map Id (Set Id)
. This map explains how each Id
has possibly multiple candidate replacements. This can happen if multiple users are editing a branch at once, as follows.
- Let's say a function named
f
refers to functions nameda
andb
, and Alice editsa
while Bob editsb
. Alice declares her newa
to be the 'canonical' version, and likewise Bob withb
. - Alice will likely decide to add a version of
f
which uses the newa
but the oldb
; and Bob will add a version which uses the olda
but the newb
. They each declare their new candidatef
canonical. - The transaction will then contain two new candidate bindings for
f
, and further they both claim to be canonical - this is a conflict. - Either Alice or Bob will need to work out how to fix the conflict - probably starting by adding a third function referring to the new
a
and newb
, and a third candidate binding forf
. - They then resolve the conflict in the transaction, by declaring the third
f
to be the only canonical one of the three in the edited set. - Either of them can now commit the transaction, updating the shared branch4 with the new bindings for
f
,a
, andb
.
This raises some interesting questions...
- When is it decided that a transaction is actually committed? It surely can't require a synchronization point between all the possible concurrent editors.
- Does it ever make sense to work on two logically distinct changes in the same transaction? Might it yield a better concurrent editing experience, by surfacing conflicts earlier? Or can the system keep watch on the merge of a number of open transactions, in order to spot conflicts?
- A transaction can yield a branch whenever it has no conflicts (although that branch might not yield a useful and consistent set of name mappings!) Might it be useful to run tests over that branch before the transaction is committed? Are the test functions referring to the functions under test in the normal way (by
Id
), or indirecting viaName
(mediated maybe by theCodebase
ability)? Maybe they're more useful for assessing the health of work-in-progress if they do the latter?
Unison does name resolution at edit-time - that is, when you're adding a reference to function g
in your code of function f
, the name g
is resolved to a hash, and that hash is actually what gets used by f
. An important consequence of that is what happens to the business of 'imports' and 'namespaces'. In other programming languages, there is normally a list of import statements at the top of each file, and those are treated as part of the code. In Unison, the set of accessible names is defined by the branch, which is separate from the code. This means different people can view a given function through their own personal 'goggles' - my view of the function uses the names I'm most familiar and comfortable with, which might be different to yours.
So we'll want to manipulate branches in similar ways to how we used imports before. If I start working with Widget
s, I might want to add a body of names relating to them to my branch, by injecting it at some prefix:
Foo
bar
List.head
Widget.start -- injected
Widget.stop -- "
Or maybe by merging sets of names without a prefix, and resolving conflicts according to some hiding/shadowing rule.
Another interesting question: when I start working with someone else's function f
, how do I learn their names for functions that it calls, which I don't have my own names for? Do they need to send me a whole branch, for me to work with their code? Maybe Unison needs to let them prune their branch down to f
and its transitive dependencies before sending it to me? Does Unison automatically inject the new names into my branch in some way?
1: Or maybe I mean Reference
? This is a bit strange: 'Id is the basic type used to refer to things in a Unison codebase'. Why isn't Reference
the name for the type used to refer to things?
2: I quite like this stars metaphor, but it breaks down slightly if you thought Unison was something you might run 'in the cloud' :-)
3: The word 'edit' rather encourages the idea that we are actually changing a function, even though it's a central concept that this never happens. Is this a barrier to learning the concept? How about 'adapt' instead? 'You are now creating a function by adapting an existing one.'
4: Is a Branch
actually a Name
referring to an Id
of type Map Name Id
? If so I guess that map probably doesn't include the name of the branch itself. It would seem a shame to keep sprouting new name/value association schemes, so might be a good idea to just have one and reuse it.
This is awesome! I won't be able give it and the other comments the attention they deserve tonight, but I'm looking forward to it.