Skip to content

Instantly share code, notes, and snippets.

@zmanian
Created February 16, 2026 04:54
Show Gist options
  • Select an option

  • Save zmanian/b4d4bcc3056e3176f9d16fbb6298de53 to your computer and use it in GitHub Desktop.

Select an option

Save zmanian/b4d4bcc3056e3176f9d16fbb6298de53 to your computer and use it in GitHub Desktop.
Proposer-Aware Mempool for CometBFT

Proposer-Aware Mempool for CometBFT

Problem

CometBFT uses flood gossip for mempool propagation: every validator receives every transaction regardless of whether it is the current block proposer. In a 100-validator network, 99 validators maintain redundant copies of the mempool that they will never use to build a block during the current slot. This wastes bandwidth and creates unnecessary load.

Insight

The proposer schedule in CometBFT is fully deterministic. All validators independently compute the same proposer for any (height, round) pair using weighted round-robin -- no communication needed. This means we can route transactions preferentially to upcoming proposers without any new consensus messages.

Architecture

Tx Sources ──► [Priority Routing] ──► Current Proposer (+ next 2)
                                  ──► Other Validators (delayed)

Three Implementation Layers

Layer 1: ProposerOracle Interface (Go, in CometBFT)

Expose the existing proposer schedule to the mempool reactor:

type ProposerOracle interface {
    ProposerAt(height int64, round int32) types.Address
    UpcomingProposers(height int64, n int) []types.Address
    IsUpcomingProposer(addr types.Address, lookahead int) bool
    CurrentHeight() int64
    CurrentRound() int32
}

Implementation wraps types.ValidatorSet with thread-safe state updates. The consensus module already computes this -- we just need to pipe it to the mempool.

Modify broadcastTxRoutine with soft priority (not hard filter):

if memR.proposerOracle != nil && memR.peerToValidator != nil {
    validatorAddr := memR.peerToValidator(peer.ID())
    if !memR.proposerOracle.IsUpcomingProposer(validatorAddr, 3) {
        time.Sleep(50 * time.Millisecond) // deprioritize, don't block
    }
}

All peers still receive all transactions. Proposers just get them faster. Backward compatible: nil oracle = existing flood behavior.

Implementation: zmanian/cometbft@feature/proposer-aware-mempool

Layer 2: Pipeline Pre-Connection (Mosaik demo)

Reactive tag-based routing can't keep up with fast BFT rotation (500ms slots). The tag-propagation + stream-reconnection cycle takes ~100-400ms. Solution: pre-tag upcoming proposers using the deterministic schedule.

3-tier tag system:

Tag Meaning Lead Time
proposer Current slot's block builder 0ms
proposer-next Next slot's proposer 500ms
proposer-soon Slot+2 proposer 1000ms

Stream consumers use a compound predicate:

subscribe_if(|peer| {
    peer.tags().contains(&Tag::from("proposer"))
    || peer.tags().contains(&Tag::from("proposer-next"))
    || peer.tags().contains(&Tag::from("proposer-soon"))
})

By the time a validator's slot starts, stream connections have been warm for 1000ms.

Layer 3: Per-Validator Raft Clusters

Each validator is a 3-node Raft cluster, not a single node:

Validator A Cluster:  [A0 leader] [A1 follower] [A2 follower]
Validator B Cluster:  [B0 leader] [B1 follower] [B2 follower]
Validator C Cluster:  [C0 leader] [C1 follower] [C2 follower]

Benefits:

  • Failover: If the cluster leader crashes, a follower takes over via Raft election. The validator never misses its proposer slot.
  • Load balancing: Followers serve read queries (CheckTx validation, pending count) at Consistency::Weak.
  • Stream stability: All cluster nodes share the same tags. If the leader fails, stream connections remain on surviving nodes -- zero reconnection delay.

This extends CometBFT's existing sentry node topology: sentries protect against DDoS, but if the validator process crashes, manual intervention is needed. Raft clusters automate failover.

Implementation: zmanian/cometbft-mempool

Mosaik API Patterns Learned

  1. Group is not Clone -- can't share group handles across tasks. Design one node as the group operator, others communicate via streams.
  2. Create producers before discovery -- if peers discover each other before stream producers exist, catalog entries won't include stream IDs. Discovery sync will miss the streams.
  3. Tag propagation via feed() is local-only -- other nodes need explicit sync_with() or wait for the gossip cycle (~15s) to see tag changes.
  4. Consistency::Strong queries only work on the Raft leader -- use Consistency::Weak on followers after waiting for committed index to catch up.

Comparison: Real CometBFT vs This Design

Aspect CometBFT Status Quo This Design
Tx routing Flood gossip to all validators Priority routing to proposer + next 2
Bandwidth O(validators * txs) O(3 * txs) for proposer tier
Proposer mempool freshness Depends on gossip convergence Direct delivery, always freshest
Validator failover Manual restart Automatic Raft election (~500ms)
Tx durability Independent per-node, can be lost Raft-replicated within cluster
Leader transition 2s+ timeout if proposer fails Pre-connected, sub-500ms
Code change in CometBFT N/A ~200 lines (4 files modified)

Commonware / Simplex Alternative

Commonware's Simplex consensus achieves 2-hop block times (vs CometBFT's 6-delta view changes) and threshold BLS signatures produce ~240 byte certificates regardless of validator set size. The same proposer-aware routing concept applies -- Simplex uses view-based leader rotation which is equally deterministic and pre-computable.

Implementation: zmanian/commonware-mempool

What This Doesn't Change

  • Consensus protocol (Tendermint BFT / Simplex) is unchanged
  • Block format, ABCI interface, state machine are unchanged
  • ProcessProposal still works because block proposals include full transactions
  • Safety: all validators still eventually receive all transactions (soft delay, not hard filter)
  • Liveness: round-change fallback floods to all peers if proposer fails

Open Questions

  • Peer-to-validator address mapping: CometBFT P2P node keys differ from validator addresses. The mapping is deployment-topology dependent and needs a pluggable resolver.
  • Sentry node awareness: In production topologies where validators sit behind sentries, the priority routing needs to propagate through sentries or sentries need their own schedule awareness.
  • ADR-118 mempool lanes interaction: How does proposer-aware routing compose with the upcoming lane-based mempool? Lanes could provide the priority mechanism natively.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment