Erlang versus transient distributed computing

I have been watching this interesting presentation of distributed computing in Erlang-elixir: https://www.youtube.com/watch?v=AiN4r8E9qKg

I want to mention the remarkable points and compare them with the transient way of distributed computing:

min 1:30 It defines a cookie as a command-line parameter. It is a good idea. Connections would be required to send the same cookie.

min 3:00 Erlang nodes connect all nodes with all by default and maintain a heartbeat between then, probably because the communication protocol is low level in order to be very fast. It uses UDP probably. Transient connect only on demand and a node may route request to other destination nodes. It is made for weakly coupled internet nodes rather than for a closed coupled local area network. For this reason transient uses TCP sockets and websockets. A node behind a firewall can be accessed using the HTTP port as long as websockets are enabled. Once connected the remote node can address nodes internal to the firewall trough the gateway node.

4:30 Erlang send asynchronous messages between processes identified by a PID that is unique across the network. The send/receive is transparent. it means that serialization and deserialization is done by the OTP runtime. The ordering of messages between single nodes is guaranteed, but the delivery of every single message is not guaranteed (at most once). If two processes are sending to a third, the ordering of messages by time are not guaranteed.

Transient uses a complete different mechanism. It does not use the concept of the process, considered as computation which run his own thread, which send and receive messages trough his mailbox. Although this can be simulated since transient-universe has mailboxes (see below), Transient communications are more like teleporting the computation from a node to another. what is transferred is not a message, but the local variables of the computation, in other words the stack, so the remote node continue the execution in that destination node until it executes another distributed primitive that translates this computation to some other node. One of these primitives is teleport which translate the computation to the remote node most recently connected. Then, the second teleport return the computation back to the calling node. wormhole node is used to connect to this node:

to run a computation in a remote node:

runAt remotenode computation= wormhole node $ do teleport; computation; teleport

So a transient program is a "do" sequence where the program execution switch nodes back and forth without any notion of process being fixed in any particular place. In this sense the processes are created and destroyed in the nodes dynamically while the distributed computation is executed.

do
   result <- runAt here dothis
   runAt there dothat
   ...

Don't think that transporting the computation context from node to node is heavy: basically the monad create and interpret REST-like paths which address the remote location where the code must continue and gives the parameters necessary. There are other optimizations: variables already sent are never being resent again.

Since the execution is synchronous by default, ordering of messages can be guaranteed. although, if required, asynchronous messages can be send too. if the remote computation executes empty, the computation never will return back. So it would be asynchronous.

min 11:50 As in the case of Erlang/elixir, there is a single socket connection among two nodes since wormhole connections reuse the same socket. A big message can force a delay for the rest of the messages.

min 12:48 Since remote processes are created and destroyed on demand there is no need to discover already running processes in remote nodes. So no global registration service is necessary.

With one exception: A process can be created permanently in another node, for example, to stream back some events.

   text <- runAt remotenode $ local $ waitEvents   getLine
   localIO $ putStrLn text

That program would read the standard input in the remote node and return back them to the calling node. waitEvents set a thread that execute getLine in endlessly in a loop. The second teleport of runAt transport each string back to the calling node.

At some time it would be necessary to kill this. For this purpose a remote process can be named, and be identified by his name and the node in which it was executed with setRemoteJob name node and killRemoteJob name.

If the remote proccess does not generate additional threads, this problem does not exist, since after the second teleport, the remote process dies and his data is garbage collected.

min 14:08: Erlang uses monitors to guarantee that processes are alive, do the housekeeping when the processes fail etc. Since transient processes are created and (if necessary) killed on demand there is a lesser need of a third process in charge of monitoring.

If something fails in a remote node, transient has a unique feature called cloudExceptions: The programmer can set the program so that exceptions in remote nodes can be handled in the calling node, which can take the appropriate actions depending on the nature of the problem, so that the callign node is both the client and the monitor.

However there is something additional. Erlang programs in different nodes are supposed to share the same code. Transient programs with completely different codebases can communicate using the Transient.Move.Services module. There is a executable build with transient-universe called "monitorService" which is in charge of downloading/compiling/executing different executables, they my be demanded by other transient executables running (using callService).

"monitorService" is a program-level monitor, not a process-level monitor.

16:26: In case of partitions/splits of the network, the registering of process names, necessary in Erlang, produce problems that are absent in transient. However an split of course may inherently produce a loss of data in any distributed system.

In transient, a communication failure may produce an exception in both nodes that the programmer could handle depending on his particular use case, and he may know the node with which the connection failed.

19:30 pub-sub mechanism is used heavily in Erlang, since it is an asynchronous architecture. To produce the same behaviour, transient can broadcast to all the nodes and put in the mailbox of each node a message, followed by empty:

  runAt node1 do local (putMailBox "hello" ; empty) <|>  runAt node2 local (putMailBox "hello" ; empty)

  clustered $ do  local (putMailBox "hello" ; empty)

The first put "hello" in the mailbox for strings in two remote nodes. The second form send to all the nodes known by the local node. Since they use empty, it is asynchronous.

There is a more optimized pubsub mechanism that is being developed, since transient networks consider web nodes as part of the cloud; If the program is compiled with GHCJS, the ghc-to-js compiler, the program can run in a web browser wich receives the program when it access the server node. So a single server can have hundred or thounsands of nodes connected via websockets. publish by indiscriminated broadcasting is not efficent in this case.

21:40 There is work in progress to allow strong consistent solutions in transient, but in general, as the lecturer says, this is a responsibllity of the particular programmer for his particular tradeofs.

26:47 Distributed systems in Transient are not necessarily about asynchronous messages. a distributed call like runAt node todo can be asynchronous (if it has empty, that means that it return 0 results) it can be synchronous (return a single result to the caller) or streamed (if it run a non-deterministic primitive like waitEvents in the remote node).

<><

agocorona/erlang-transient.md