Skip to content

Instantly share code, notes, and snippets.

@pchiusano
Last active May 19, 2020 16:56
Show Gist options
  • Save pchiusano/b1cddb3d6a6935d9eae45e3c4ae4d04a to your computer and use it in GitHub Desktop.
Save pchiusano/b1cddb3d6a6935d9eae45e3c4ae4d04a to your computer and use it in GitHub Desktop.
Distributed programming API for Unison
unique ability Remote loc task result g where
at : loc -> '{Remote loc task result g, g} a -> task a
fork : '{g} a -> task a
await : task a -> result a
cancel : task a -> ()
location : task a -> loc
type Value a = Value a
Remote.local.sequential.handler : Request {Remote () Value Value g} a ->{g} a
Remote.local.sequential.handler = cases
{ a } -> a
{ Remote.at _loc a -> k } -> v = Value.Value !a
handle k v with Remote.local.sequential.handler
{ Remote.await val -> k } -> handle k val with Remote.local.sequential.handler
{ Remote.cancel _ -> k } -> handle k () with Remote.local.sequential.handler
{ Remote.location _ -> k } -> handle k () with Remote.local.sequential.handler
unique ability Durable.R d where
restore : d a -> a
unique ability Durable.W loc d where
save : a -> d a
saveAt : loc -> a -> d a
location : d a -> loc
unique ability Ephemeral.R e where
restore : e a -> a
unique ability Ephemeral.W pool loc e where
pool : pool
save : a -> e a
saveAt : loc -> a -> e a
location : e a -> loc
retain : pool -> ()
@pchiusano
Copy link
Author

pchiusano commented May 19, 2020

Design notes from this first cut:

  • at returning a task a is handy, but raises a question - if the task being spawned has completed with some value a on another node, how long do we keep that a value around?
    • at could instead just return (), and then any values it returns has to be done via some sort of "remotely accessible" MVar thing. This does not seem to help, since that magic MVar is going to have the same questions.
    • What about a keepalive system where anyone with a reference to a task keeps it alive via regular heartbeats? Once nothing running references the task, it goes away. Feels complicated but perhaps more problematic involves a lot of communication - imagine spawning millions of these logical tasks - are they all sending keepalives? Maybe the keepalives are batched. Overall feels complex, though might be the nicest API.
    • A third option is just have task results expire a few seconds after completion; if there's nothing await-ing the result still, the result is discarded. No keepalives needed. Correct usage is to await tasks shortly after spawning them. Perhaps this is awkward to program with, but might get a long way with some helper functions. Needs some experimentation.
  • I wondered if Ephemeral is superfluous. Perhaps you can represent distributed in-memory data types just with nested tasks. See sketch below.
  • You could remove the location from durable, just do save : a -> (loc, d a). I think this is probably less good. The location returned by the location operation is not necessarily the same location that was passed to saveAt - the durable may have migrated in the meantime. location can give the latest result, so tasks operating on the data can be spawned close by.
  • Slightly awkward having await return a result a. The thought there was to allow different handlers to supply different result types (like Either MySpecialException a), but you really want something like a typeclass here.

Here's a sketch of a simple distributed data type, a Stream:

type Stream g a 
  = Empty 
  | Delay ('{g} (Stream g a)) 
  | Cons a (Stream g a)  

use Stream Empty Delay Cons

Stream.save = cases
  Empty -> Empty
  Cons hd tl -> Cons hd (Stream.save tl)  
  Delay s -> 
    t = Remote.fork '(Stream.save !s)
    Delay '(toException (await t))

The Stream.save function materializes the stream, but not all at one location. At each Delay node, the remainder of the stream is scheduled onto another logical node. When Stream.save completes, the entire Stream will be in memory, spread across multiple logical nodes. Pretty epic. With a more interesting data structure than Stream, this is how you can do Spark-like workflows.

OTOH, it does seem pretty easy to seems very easy to accidentally reify an infinite amount of data. (Or maybe it's exactly as easy as it is currently, like if you call toList on any other infinite stream.) You can always cancel the task.

The problem with the above implementation is we don’t await the task until the stream is forced, basically its using task in place of Ephemeral. So it falls afoul of the suggestion that await should immediately follow the fork/at call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment