Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save benhenryhunter/687299bcfe064674537dc9348d771e83 to your computer and use it in GitHub Desktop.
Save benhenryhunter/687299bcfe064674537dc9348d771e83 to your computer and use it in GitHub Desktop.
Blob Propagation Issues Mar 27-28th

On March 27-28 the Ethereum network suffered from extremely high rate of miss slots. Most of these slots were first relayed from the bloXroute relays. We identified that the bloXroute relays worked properly throughout the incident, publishing blocks and blobs correctly, however they propagated the blocks fast thru the BDN while the blobs sidecar propagated through the p2p more slowly (the sidecar is expected to propagate slower, and is allowed to be accepted until t=8 sec) this uncovered a specific CL behavior which caused clients to reject these blocks and cause missed slots. In the current Lighthouse version, the node is expecting the peer that first provided the block to also provide the blobs. The BDN does not propagate blobs and that caused the BDN connected consensus nodes to ignore blocks that were first received from the BDN. A recent release of the BDN improved the speed of gossiped blocks without blobs, relying on the rest of the p2p network to propagate blobs as needed which caused the significant increase of the missed slots. The BDN relies heavily on Lighthouse, which makes up the majority of our beacon nodes at bloXroute, due to its performance and speed. Post release we witnessed successful block propagation through our BDN and made the assumption this release was valid. This also showcased itself mainly on the bloXroute relays due to their tight coupling with the BDN. The BDNs speed of providing the beacon nodes with the block caused this behavior even in scenarios where other relays were publishing blocks that bloXroute did not have.

Throughout this time the bloXroute relays were providing blocks with blobs back to validators and also publishing blocks with blobs to our BDN and to our network of beacon nodes. These publish requests would return a 202 response due to the beacon nodes already seeing that block from the BDN.

This issue was able to be resolved after a series of tests were done isolating this issue to lighthouse’s behavior after seeing a block first through the BDN and then slowly migrating our relay away from using the BDN for block publishing and then disabling the BDN’s block propagation of any blocks containing blobs.

@benhenryhunter
Copy link
Author

Also for @djrtwo for more context from Sproul: Behavior seen is when blobs are not provided over p2p.
Screen Shot 2024-03-29 at 4 32 17 PM

@michaelsproul
Copy link

michaelsproul commented Mar 29, 2024

@benhenryhunter I haven't seen any evidence of a Lighthouse bug at the p2p level. This statement of yours is not correct:

In the current Lighthouse version, the node is expecting the peer that first provided the block to also provide the blobs.

What Danny wrote is correct: Lighthouse's issue is at the HTTP level, and is only reachable if the only blobs sent by the relay are sent via HTTP to nodes that have already received the block via gossip. See my full response on Twitter: https://twitter.com/sproulM_/status/1773853486373130708

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment