Skip to content

Instantly share code, notes, and snippets.

@Horusiath
Last active June 21, 2023 18:21
Show Gist options
  • Save Horusiath/2a5aaa5a11c690b23e21b5a891b36792 to your computer and use it in GitHub Desktop.
Save Horusiath/2a5aaa5a11c690b23e21b5a891b36792 to your computer and use it in GitHub Desktop.
Yggdrassil design document

Yggdrassil design document

This paper discusses some ideas of how to extend HyParView/Plumtree beyond the most straightforward implementation. It presents the goals that we wanted to have in mind when building actual implementation in Rust.

Motivation

While Partisan - the original and probably the most widely adopted implementation of HyParView+Plumtree - value proposition relies on building huge clusters (having >10 000 nodes) capable of broadcasting messages via self-healing broadcast trees, its network protocol seems to limit its possibilities to centralized solutions - like services living in data centres or at the edge, where peer discoverability and reachability is in control of their developers/operators. Other issue is that when exchanging neighbors/shuffle peers, plumtree blindly replaces them with current passive view without checking if they are reachable for the current peer in the first place.

We think, that adopting HyParView/Plumtree in Rust with extensible API could expand its capabilities over to all kinds of P2P apps, including desktop, mobile and embedded devices. For this to happen, it would be good to make network protocol not tightly correlated to TCP protocol. A generic and more pluggable networking layer could be desired. As a reference we'd like to be eventually capable of hosting our cluster over the WebRTC protocol as well.

Challenges of WebRTC

When working with P2P devices, the main networking problem is discovering how to reach a particular peer in the first place. NAT traversal is to be expected and it won't allow us to simply address the peer IP address as its seen from the peer perspective itself. For such cases protocols like STUN (and TURN to bypass firewall) are developed.

Once we know how to reach our address from the outside, we could register it in some form of 3rd party tracker service - it could provide peer-id/address mapping for other peers which want to communicate. Usually tracker service would be needed only to join a cluster (get first accessible address).

WebRTC however doesn't exactly work on standard premise of IP addresses as a means to establish connection. What it requires is a offer/answer cycle: peer first publishes an offer (just like registering on a tracker server would). However other peers cannot establish connection based on offer alone. They need to propagate answer before pairing the peers, usually using the same mediator service. Moreover offer/answer works more like ticket - once an offer/answer has been used, it's consumed and to establish new connection with the same peer, a new offer needs to be created and registered. This process is called signalling and is not solved by WebRTC protocol itself. For this reason signalling servers perform much more active role connection establishment than tracker servers. This could be again side-loaded over as an extension of HyParView membership protocol - where cluster members work as signalling services for newly joined peers, requiring external signalling service only to resolve initial join request.

Peer Info / Manifest

Each peer is identified by peer-id. However when storing information about passive/active peers, we don't keep only that. We also get some metadata about the peer, which may be useful for other cluster management related operations. If peer info changes if could be dynamically propagated as a gossip over plumtree, or it could require resetting the connection (this may be required ie. if info contains information about WebRTC data channels that changed, as they cannot be modified without resetting the connection).

Customizable event log buffer and digest

Plumtree buffers gossip messages and occasionally creates digests of them in order to ensure that all messages reached their targets. Now, this process could be more customizable, eg.:

  1. In Yrs/Yjs what we want to broadcast are document updates. However updates are not static: as we wait for their confirmation new updates could arrive. In Yrs updates can be merged together into more compact form. Meaning that instead of sending two updates U1 and U2, instead we could send update U3 which is their merged equivalent, but smaller.
    1. In this scenario the digest of updates is also customized: in fact Yrs/Yjs already offer so called state vector (vector version), that could be used for this role.
    2. Instead of having a standard message log buffer, Yrs document store itself could perform that role as it's capable of creating missing update given current state of the document of state vector (our digest).
  2. Git-like partially ordered logs could use similar technique.
    1. We customize our digest to be a list of a current head(s) commit hashes and then compare them against received Graft(digest) request which can contain heads of remote side to decide which commits needs to be propagated.
    2. Instead of having a dedicated message log buffer, the commit history itself becomes one.

In both cases digests sends by ihave(d1) and returned by graft(d2) don't have to be the same, but we need to be able to define d2.contains(d1) relationship between them.

Ideas for extensions

This is another batch of ideas to discuss and possibly implement, once stable MVP will be reached.

Broadcast groups

Imagine following scenarios:

  1. Cross-continental cluster of machines, partitioned over multiple data centres. Each DC partition can by itself consist of hundreds of collocated peers.
  2. P2P chat or collaboration service with thousands of communicating peers. However quite often some of these peers are grouped together ie. a dozens of peers collocated in the same conference room.

In both cases we deal with groups of possibly collocated peers. Plumtree first-come-first-served behaviour would probably take good advantage of that at low code complexity cost. At least as long as underlying HyParView shuffling won't happen. Thing is that such collocation sometimes could take advantage of cheaper or more performant broadcasting techniques eg. subnet multicast or Bluetooth broadcast.

The idea here is that beside being identified by peer-id, each peer can also belong to a group-id. All peers in the same group are treated as one delivery location. To avoid bottlenecks and increase redundancy, it would be possible to permit for N connections to a given group (instead usual 1-1 connection per peer), where N is much smaller (eg. 2-3) that active view size.

Multi Plumtree

The thing about Plumtree is that it eagerly uses only subset of HyParView active view size. The remaining active connections which are not part of broadcast tree are seldom used, only to periodically propagate message digests to verify if broadcast tree needs healing. This could lead to an uneven network load distribution, potentially even bottlenecks or head-of-line problems.

What we could do is to offer multiple channels - each representing a Plumtree broadcast tree - living over a single HyParView membership view. Each channel tree could be spread over different sets of connections. This way connection 1 could be used by channel A as an eager connection, while being lazy connection in context of channel B.

This approach also goes nicelly with WebRTC concept of data channels, however limited: IIRC WebRTC data channels cannot be added/removed dynamically, they must be defined at the moment of negotiating the connection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment