Skip to content

Instantly share code, notes, and snippets.

View zsfelfoldi's full-sized avatar

Felföldi Zsolt zsfelfoldi

View GitHub Profile

State size limitation proposal

totalSize(block): sum of the sizes of RLP encoded trie nodes and contract codes in the account trie and storage tries. Items or subtrees referenced multiple times are counted multiple times. Needs a database upgrade to store totalSubtreeSize for each internal node. Total size is the total subtree size of the state root. Storing in consensus is not needed.

sizeLimit(block): specified in the protocol and can only be changed for future blocks in a hard fork. (only applies to the state, chain history is not discussed here)

Individual entry size

Since internal account trie nodes do not belong to a single entry we use an estimated value overheadEstimate over the actual enrty size when calculating individual entry size on which the actual pricing is based.

^Cpanic: boom
goroutine 150 [running]:
github.com/ethereum/go-ethereum/internal/debug.LoudPanic(0xdffc20, 0x1142b30)
/home/fefe/go/src/github.com/ethereum/go-ethereum/internal/debug/loudpanic.go:26 +0x4e
github.com/ethereum/go-ethereum/cmd/utils.StartNode.func1(0xc420112280)
/home/fefe/go/src/github.com/ethereum/go-ethereum/cmd/utils/cmd.go:79 +0x23a
created by github.com/ethereum/go-ethereum/cmd/utils.StartNode
/home/fefe/go/src/github.com/ethereum/go-ethereum/cmd/utils/cmd.go:65 +0xb7

You can run TrueBit verification games for C/C++ code on both Kovan and Rinkeby testnets, and we welcome you to test our code! Please start with this Docker container:

https://github.com/TrueBitFoundation/test-node-docker

We have a VM that is based on WASM, there are some changes to make it easier to interpret. Here is some documentation: https://github.com/TrueBitFoundation/ocaml-offchain/wiki/Initializing-and-preprocessing-WebAssembly

Currently, the code will have to be written C/C++, or Rust, and for output and input ,the program has to read and write to files. In the blockchain, the files are identified by their binary merkle roots. So for example if one would like to calculate a sha256 hash of a large file, then the task would have the merkle root of the file as an argument. But if the merkle root is nonsense, and somebody posts a bad solution, there is no efficient way to disprove the (data availability problem). So there might be some problems with checking data that is available in the blockchain

Checkpoint syncing

Checkpoint syncing requires a CHT which is a trie that associates block hashes and TDs (total chain difficulty values) to block numbers.

https://github.com/zsfelfoldi/go-ethereum/wiki/Canonical-Hash-Trie

Knowing the root hash of this trie allows the client to access the entire chain history securely with Merkle proofs so only the last few thousand headers need to be downloaded. Right now we have a hardcoded trusted CHT root hash in geth.

Trustless syncing logic

Here are the two basic structures required for chain based logging; This is just a rough draft, no need to follow it exactly.

ObserverChain

ObserverChain (implemented in les/observer package) creates observer blocks, stores them in the database and can access them later. Observer blocks have the following fields:

  • PrevHash common.Hash
  • Number uint64
  • UnixTime uint64
  • TrieRoot common.Hash // root hash of a trie.Trie structure that is updated for every new block

State trie GC proposal 2

This GC scheme proposal is somewhat related to my first proposal https://gist.github.com/zsfelfoldi/5c4f36fb8a898acd092a62dea4336f88 but it is also closer to the reference counting approach because it does not duplicate trie nodes just stores extra reference data. Instead of just counting references to nodes, it actually stores all refs which is still not a big overhead but greatly simplifies GC process. Iterating through the old tries is not necessary, we can just scan the db and remove expired refs and data. Since nodes are still referenced by hash there is also no need to modify the fast sync protocol.

Note: I still believe that reorganizing nodes by position (as it happens in my first proposal) might yield some performance improvements but on the other hand if we can keep most of the state in memory then this is not really relevant. This GC variant is definitely easier and safer to implement.

Database format

  • Trie nodes are stored as they are now: trie node hash -> `

State trie GC proposal

This is my proposal for efficiently garbage collecting tries. It is low overhead and easy to maintain so hopefully not painful in the long term. In the short term it is a bit painful because it changes the database structure and also requires protocol change for fast sync (not for LES). Still I think it is manageable and I feel that this is the "right" way to store/access/reference trie nodes. It could also significantly increase both EVM and fast sync performance.

Database format change

  • old format: (trie node hash) -> (trie node)
  • new format: (key prefix) (trie node hash) (creation block number) -> (trie node)
  • new format for contract storage tries: (contract key) (storage key prefix) (trie node hash) (creation block number) -> (trie node)
There are two more or less separated directions I'd like to pursue for testing LES. One of them is testing the actual protocol functionality through all available APIs and platforms in a synthetic environment: isolated private chain and test nodes (probably created with Puppeth), reproduceable API call test sequences with controlled peer connections/disconnections. The other direction is testing automatic peer connectivity behavior and request load distribution in a real world setting, with our own LES servers doing actual service on the mainnet while logging certain events and collecting them in a database.
Right now LES is practically unusable on the public network because of serious peer connectivity issues so I believe the "real world" testing is more urgent at the moment. There are a number of reasons that could probably contribute to connectivity problems:
- Peer discovery
LES uses an experimental DHT that allows node capability advertisement. This DHT has some problems and it would make sense to log

Functions specified in the ENR

  • ENR.dhtAddress(): the Kademlia address of a node, calculated according to the ID scheme
  • ENR.cipher( packetData, key ): cipher algorithm
  • ENR.symmEncryptKey( privKeyA, pubKeyB ): symmetric encryption key generator
  • ENR.asymmEncrypt( packetData, recipient.pubKey ): asymmetric encrypt function
  • ENR.signature( packetData, pubKey ): digital signature
  • ENR.powValid( packetHash, packetType, packetFormat ): proof of work validity check