Urkel Trees: An optimized and cryptographically provable key-value store for decentralized naming

Boyma Fahnbulleh (Handshake)

Merkle Trees are great, but can we do better?
Ethereum's Wish List on better Merkle Trees:
- Wanted a key-value store for the state
- Allow updates without having to reconstruct the entire tree
- Has bounded depth
- History independent: root hash doesn't depend on ordering among updates (i.e., commutative updates)
- Merkle Patricia Tree is basically a fancy radix hash tree
  - Compresses any node that doesn't have more than 1 child
  - Saves a lot of memory—walking down an empty-ish paths is common
  - Stores a key for a value in a DB (like LevelDB)
  - The pointer to the next node in the tree has a cryptographic hash of that node
Looking at these, is there a better construction tailor-made for Handshake?
What is Handshake?
- Blockchain-based DNS naming system, intended to replace the DNS root zone file & servers
- Current root zone embedded in blockchain space, and rest of the space will be auctioned on blockchain
- Light client is super important in this kind of system
  - Regular users who make DNS lookups will want something very light
Handshake's Merkle Tree wish list:
- Key-value store with minimal storage
- Exceptional performance on SSDs (for end consumers)
- Small proof size (< 1kb)
- History independence (again, commutativity of updates)
- Candidates?
  - Rebalancing data structures are no good, since they're not history independent
  - Merkle Patricia Tree, right?
    - But proof sizes are too large, and storage requirements are too large
  - Sparse Merkle Tree?
    - Didn't cut it in terms of performance, requires a bunch of DB lookups
    - Way too many rounds of hashing per insertion
What can we use instead?
Urkel Tree: an optimized and cryptographically provable key value store for decentralized naming
- Intended to be stored in append-only flat files
- Base-2 Merkleized trie
  - Based on the Merkle Set, invented by Bram Cohen
- Two types of nodes: internal nodes and leaf nodes
  - Storing 4 types of nodes like in Patricia Trees sucks
- Append only files gives you crash-consistency
How does it work?
- Basically involves a lot of tombstone nodes and common hash prefixes...
- Honestly, just watch the talk if you want to know. Easier to explain w/ ASCII art in the slides than verbally.
Benchmarks
- 50 million 300-byte leaves, 500 leaf batches, periodic commissions of 44K values
- 500 value batches averaged 100-150ms
- 44K value commissions averaged 400-600ms
- Average node depth of 27-28 bits
- 800 bytes proof size
- 1-2ms proof creation time
One problem: key grinding attacks to force the tree to grow (27 bits is small)
- Bitcoin produces 72-80 bit collisions on block headers all the time
- Can do a base-2 Merkleized radix tree for DoS protection
  - But adds complexity to the code
- Called "Merklix variant", based on Merklix Tree
Summary:
- More performant
  - Urkel Tree allows you to write straight to disk, no underlying KV store
- Simple
  - Only two types of nodes
- Storage
  - Nodes are constant size
- Proof size
  - Are really small

Haseeb-Qureshi/urkel.md

Urkel Trees: An optimized and cryptographically provable key-value store for decentralized naming

Boyma Fahnbulleh (Handshake)