Created
April 12, 2016 15:21
-
-
Save nagydani/ea69e893dbfa52c3b375189b4a747360 to your computer and use it in GitHub Desktop.
TOPIC DISCOVERY
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Introduction | |
Each node in the Ethereum network has a set of tags called "topics" by | |
which other nodes may choose to associate with them. They may indicate | |
capability, responsibility for certain information or functionality or | |
any other attribute by which the node may wish to belong to a connected | |
subnetwork of the Ethereum network. In this document, an infrastructure is | |
outlined by which nodes of a certain topic may discover one another. | |
The presented infrastructure can be used for two distinct purposes: | |
nodes covered by the same topic can use it to form and maintain a | |
connected network, while nodes not covered by a certain topic might use | |
it for finding nodes covered by that topic for using a service provides | |
by such nodes. | |
Topics are assumed to potentially cover arbitrarily large or small parts | |
of the Ethereum network, meaning that certain topics can contain only | |
one node while other topics may span the entire network. Irrespective of | |
the number of nodes covered by a topic, the discovery infrastructure should | |
provide a suitable set of bootstrap nodes for new nodes wishing to join | |
the network or those wishing to use the services of nodes covered by said | |
topic. | |
Furthermore, it is assumed, that each node is covered by at most a few dozen | |
topics. | |
2. Topic structure | |
On the lowest level, topics are assumed to be unstructured, denoted by a | |
string of printable characters. On a higher level, two possible | |
hierarchical topic structures are defined: one in which nodes covered by | |
a topic and all its subtopics are searched and another in which nodes covered | |
by a topic and all its supertopics are searched. Subtopics are denoted by | |
a string that has the parent topic as its prefix, followed by some delimiting | |
character. | |
The two higher-level hierarchies are handled as follows: in the first | |
case, when nodes covered by subtopics need to be discoverable, nodes | |
must be tagged by their topic and all their supertopics (e.g a node | |
covered by "animal.mammal.dog" is also covered by "animal.mammal" and | |
"animal", so that a search for "animal.mammal" will find it), in the | |
second case, when nodes covered by supertopics need to be discoverable, | |
the node doing the discovery also searches for all the supertopics. | |
Apart from this, the directory is topic sturcture agnostic. | |
The DHT distance measure between a topic and a node's DHT address is | |
defined as the XOR of the latter with the hash of the topic string. | |
3. Architecture | |
Two states of discovery are clearly distinguished: before and after the | |
searching node made the first successful handshake with a node covered by | |
the searched topic. | |
In the first state, the node searches the topic directory organized as a | |
DHT described later in more detail. From the topic directory, it will | |
obtain the contact details (network address, port and public key) of a | |
relatively small set of matching nodes to which it attempts to connect. | |
Upon the first successful handshake, it transitions to the second state. | |
In the second state, nodes covered by a certain topic introduce each | |
other to the searching node. The strategy followed in this state may | |
well depend on the requirements of the topic and the desired | |
(sub-)network topology. Thus, the discussion of this second state search | |
falls outside of the scope of this document. Nodes can transition back | |
to the first state if for any reason they disconnect from all other nodes | |
covered by their topic. | |
Each node participating in the DHT stores records for a (potentially) | |
large number of topics and at most a small number of records for each | |
topic. Each record binds one node to one topic. A record contains, in | |
addition to the topic, the network address, the port and the public key | |
of the node as well as a timestamp and a digital signature by the | |
referred node calculated on the rest of the record. | |
4. Registration | |
Nodes regularly register themselves for each topic covering them by pushing | |
the corresponding records into the DHT. Only registrations with a very recent | |
timestamp and a correct signature are accepted. If the DHT is direct (no | |
content forwarding), the originator network address is also checked, as nodes | |
are only allowed to register themselves. | |
In order to prevent overload on nodes with addresses close to popular | |
topics, the time interval for these registrations is controlled as | |
follows: Upon registration, the timestamp of the most recent registration | |
is returned to the registering node. The repeated registration is scheduled | |
at a delay inversely proportional to the age of the most recent record. | |
If the capacity for records for one topic is full, a random record for | |
the topic in question is deleted. As a security measure, nodes with overly | |
frequent registrations may be blacklisted. | |
5. Retrieval | |
First, a DHT node sufficiently close to the topic is found. Upon request | |
(i.e. search by topic), all sufficiently recent records are returned. | |
If the DHT is implemented without query and data forwarding, this might | |
result in excessive hammering of nodes close to popular topics. For such | |
implementation, it is advisable for registering nodes to register | |
themselves in every step of DHT lookup, not only at nodes closest to the | |
topic. This way, retrieval lookups are likely to find registrations | |
closer to themselves and further away from the closest node in the DHT. | |
Another possible defense against hammering by retrieval requests is to | |
use them sparingly: both topic member nodes and client nodes must strive | |
to remain connected to the topic network by queriing their neighbors for | |
further connections in order to maintain a healthy redundance of | |
connectivity without resorting to discovery. | |
Forwarding DHT implementations, however, avoid the issue altogether by | |
propagating and caching records along the path between the node on which | |
it was found and the querying node. | |
6. Proposed implementation strategies | |
a) Leave the UDP-based Kademlia discovery intact and oblivious of topics, | |
its only purpose being to maintain a connected network of assorted | |
Ethereum nodes. The above proposed mechanism would be implemented as a | |
p2p subprotocol (named, e.g., DIR) with a forwarding Kademlia similar to | |
that of Swarm, based largely on Swarm codebase. Instead of storing one | |
Swarm chunk per key, it would store a fixed number of topic discovery | |
records per key. | |
b) Add topic discovery directly to the existing UDP-based node discovery | |
Kademlia and implement the anti-hammering measures proposed in the | |
previous section. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment