Initial design doc for a new Go client for ETH 2.0. By @protolambda.
Hope the ideas and outlined problems help design the ETH 2.0 successor of Geth ("Firefly"? See F.A.Q.).
This is a starting point for a work-doc for initial contributors, some of the goals may still change.
- Lightweight
- No extras that make the core harder to get right (verification, performance)
- Be cautious with off-the-shelf stuff, it may bloat.
- Design to enable lightweight enviroments (cheap VPS, smartphone) to run it.
- Performant:
- Quick processing, if there is any bottleneck, it should be the network
- Parallel processing where possible (Go routines)
- Design towards good usage of Go channels
- Encapsulation:
- Easy to keep up with spec-changes
- Easy to verify and understand a sub-section of the code
- Clear interfaces:
- A complete plugin system would be to much (initially at least)
- Strong composition pattern: Good fit for Go.
- Easy to test:
- Encapsulation helps
- Hook different components together through channels where feasible
- Experiment-first:
- Don't aim to be complete over correct
- Scrutinize ETH 2.0 spec in new ways
- Easy and open for contributions:
- Encapsulated design -> verifyable changes
- Support for experimentation
- "god-object": one state.
- Goal: make changes:
- easy
- readable
- verifyable
- fast
- cached
- Problems:
- Duplicate data when storing as a whole
- Memory. But not all data has to be available at all times.
- Optional: tracking longer history of changes
- Goal: make changes:
- Undocumented life-cycle expectations
- Documented: "slot", "block", "epoch"
- A try to document the full-extend of processing (afaik):
- Block ingestion (either from local node or from network)
- Validate basic requirements (known parent, known eth1 ref, etc.)
- Block pre-processing
- Retrieve state of parent block
- We want a handle/view to the state that:
- can handle changes, without persisting immediately
- reflect unchanged data
- The state can be big, and there may be many being processed in parallel.
- We want a handle/view to the state that:
- Transition state to slot of ingest-block (or just before, easy to make off-by-1 errors with slots...)
- Change retrieved data, don't persist to disk
- Retrieve state of parent block
- Block processing
- Apply block changes to state
- Encapsulate changes, don't make one ugly transition.
- Spec generally encapsulates it in sections, but can improve upon.
- Split transition into multiple files: changing/contributing will be easier
- Make changes apparent: ideally we know what data to serialize again, and which can be retrieved from a cache
- Make validation checks in advance where affordable, prevent easy DOS with invalid data.
- Encapsulate changes, don't make one ugly transition.
- Apply block changes to state
- Block post-processing
- Serialize state parts where it's necessary
- Hash what's necessary (tree-hash)
- Verify state root of block
- Store post-state
- Block storage
- Store block
- Block ingestion (either from local node or from network)
- Attestations (on unfinalized blocks) processing:
- Aggregation is unclear, but necessary:
- Fork-choice based on summing individual attestations is super slow
- Sharing attestations can be optimized with it
- Verifying may be faster
- Aggregate per-target, i.e. ideally we have one batch of attestations per attested block.
- Need to keep track of attestations, we want to produce new slashings ourselves.
- Due to large amounts of attestations, it may require:
- Storing some idle data on disk
- Splitting attestation data:
- Verified stripped-down attestations in memory
- Full attestation data on disk (also for restart, see below)
- Persisting data for use after restart. This is a over-looked storage requirement.
- Aggregation is unclear, but necessary:
- Serialization of the "god-object": no serialization-cache by design.
- Can we generalize the way a state section:
- tracks its changes
- maintains a serialization cache
- can serialize when necessary (soft, use cache)
- is loaded from serialized version
- fill initial cache
- maintains a hashing cache
- hash when necessary (soft use cache, tree-hash)
- Can we generalize the way a state section:
- Managing unfinalized blocks:
- Fork rule execution, spec is slow
- Easy & quick access to necessary data for fork-choice:
- block hashes
- block parent-hashes
- block slot
- optional: block height (count since genesis)
- Possibly like a small DAG, easy to implement fork-rule on top of
- Easy & quick access to necessary data for fork-choice:
- Quickly changing head, prevent big updates.
- The point of time where we want to re-determine the head:
- On ingesting a block, when already fully synced
- When unprocessed weight of collected (and aggregated) attestations surpasses a threshold
- After syncing
- The point of time where we want to re-determine the head:
- Fork rule execution, spec is slow
- Provide access to events
- Implement subscriptions with channels
- Possibly use Go-ethereum or other events implementation?
- Implement some of the getter/streamer RPC functionality on top of this.
- Validator should be able to access this easily and quickly.
- Choice: do we connect our validator node(s) via:
- RPC
- direct to events
- both?
- Choice: do we connect our validator node(s) via:
- Implement subscriptions with channels
- Provide storage for state
- Make pruning easy
- Avoid duplication of data (E.g. storing the full validators list every slot, when it only changes every epoch, or with upcoming changes maybe every so often, but likely not continuously)
- Make lookups fast
- Writing speed is not so important, the latest-states may be cached in memory
- Possibly support some sort of queries
- Possibly support fetching of ranges of data
- Provide storage for blocks
- Block storage is mostly there to sync other non-light peers with.
- Writing speed is more important (afaik)
- Make pruning easy
- Iteration of keys may be completely unnecessary: we have latest-blocks references in state now.
- Provide storage for attestations
- See attestation comments above.
- Persisting for after restart
- Used to create slashings, and restore after restart (or we handle that separately, e.g. persist DAG)
- We could abstract aggregation per-target by indexing storage by target.
- Integrate BLS
- Many teams had problems integrating BLS, mostly due to reliance on cross-language or native-lib interactions.
- Don't roll your own crypto, yet you have to find something that works well.
- We could share effort with Prysmatic here. (Need to look at licensing here however, the wrapper has a different license than the external library underneath)
- Better Serialization patterns (optimize accesses and caching)
- Experiment with SOS format
- Better Attestation Aggregation
- Implement batching well, useful for fast fork-choice
- Implement fork-choice cache on top of batch: i.e. track changes in weights of batches (Already hacked together experimental version in LMD-GHOST simulation, here)
- Cache can be partially processed: only if change in weights is big enough. Good for speed. Sort of supported by spec (
FORK_CHOICE_BALANCE_INCREMENT
). - Experiment with storage solutions
- Possibly implement state transitions with a decoration-pattern:
- Like composition, but with a clear order, generic, extensible, and relative easy to implement with Go interfaces. Seems like a good fit to avoid clumsy inheritance approaches that don't fit Go
- Provides some good benefits: extensible, good encapsulation and clear processing order
- Implement a DAG (a lot like a tree here tho)
- Fork-choice from DAG (already implemented here)
- much faster than retrieval of blocks (no unnecessary traversal or allocations)
- more minimal: dag-nodes don't need complete block data
- Easy slot-based pruning (could leave disconnected graph components, but eventually pruned)
- Reasonable branch-based pruning, if even necessary
- Quick to switch head, and justified block
- Complete understanding of available forks available
- Fork-choice from DAG (already implemented here)
- Synergize DAG <-> storage.
- If state storage really needs to be minimal, state can be split up:
- state-subsection caches already provide:
- structured information
- storage key (hash of subsection)
- traverse DAG from head to retrieve all necessary state.
- mark DAG node if it contains a change in storage of a subsection
- walk back from head, and load state sections for first change-mark
- No duplication, of any data, even with forks!
- state-subsection caches already provide:
- Maybe too ambitious, up to decide if it is worth it, given state size and processing bottleneck
- If state storage really needs to be minimal, state can be split up:
- Licensing, commercial use is really not that bad.
- Arguably built too quick for production: i.e. off-the-shelf components are preferred over work with and on the spec. This progresses the spec less.
- More people need to scrutinize the spec in different ways.
- More options available
- Experiment with big new ideas.
The initial phase of starting a new client implementation is messy, but everyone is welcome, please get in touch with the others on the go-ethereum discord (firefly channel).
Honestly, no idea. But @karalabe (Go-ethereum dev, Péter Szilágyi) started a repository, and I am looking for a more experimental approach, implemented in Go, than Prysm.
Something with "light", as a reference to the beacon-chain. Similar to other ETH 2.0 clients naming process (e.g. Prysm, Lodestart, Lighthouse, Artermis).
Yes, we know that there is some hardware wallet with the same name. If you have a better name for this project, please let us know.