CometBFT uses flood gossip for mempool propagation: every validator receives every transaction regardless of whether it is the current block proposer. In a 100-validator network, 99 validators maintain redundant copies of the mempool that they will never use to build a block during the current slot. This wastes bandwidth and creates unnecessary load.
The proposer schedule in CometBFT is fully deterministic. All validators independently compute the same proposer for any (height, round) pair using weighted round-robin -- no communication needed. This means we can route transactions preferentially to upcoming proposers without any new consensus messages.
Tx Sources ──► [Priority Routing] ──► Current Proposer (+ next 2)
──► Other Validators (delayed)
Layer 1: ProposerOracle Interface (Go, in CometBFT)
Expose the existing proposer schedule to the mempool reactor:
type ProposerOracle interface {
ProposerAt(height int64, round int32) types.Address
UpcomingProposers(height int64, n int) []types.Address
IsUpcomingProposer(addr types.Address, lookahead int) bool
CurrentHeight() int64
CurrentRound() int32
}Implementation wraps types.ValidatorSet with thread-safe state updates. The consensus module already computes this -- we just need to pipe it to the mempool.
Modify broadcastTxRoutine with soft priority (not hard filter):
if memR.proposerOracle != nil && memR.peerToValidator != nil {
validatorAddr := memR.peerToValidator(peer.ID())
if !memR.proposerOracle.IsUpcomingProposer(validatorAddr, 3) {
time.Sleep(50 * time.Millisecond) // deprioritize, don't block
}
}All peers still receive all transactions. Proposers just get them faster. Backward compatible: nil oracle = existing flood behavior.
Implementation: zmanian/cometbft@feature/proposer-aware-mempool
Layer 2: Pipeline Pre-Connection (Mosaik demo)
Reactive tag-based routing can't keep up with fast BFT rotation (500ms slots). The tag-propagation + stream-reconnection cycle takes ~100-400ms. Solution: pre-tag upcoming proposers using the deterministic schedule.
3-tier tag system:
| Tag | Meaning | Lead Time |
|---|---|---|
proposer |
Current slot's block builder | 0ms |
proposer-next |
Next slot's proposer | 500ms |
proposer-soon |
Slot+2 proposer | 1000ms |
Stream consumers use a compound predicate:
subscribe_if(|peer| {
peer.tags().contains(&Tag::from("proposer"))
|| peer.tags().contains(&Tag::from("proposer-next"))
|| peer.tags().contains(&Tag::from("proposer-soon"))
})By the time a validator's slot starts, stream connections have been warm for 1000ms.
Layer 3: Per-Validator Raft Clusters
Each validator is a 3-node Raft cluster, not a single node:
Validator A Cluster: [A0 leader] [A1 follower] [A2 follower]
Validator B Cluster: [B0 leader] [B1 follower] [B2 follower]
Validator C Cluster: [C0 leader] [C1 follower] [C2 follower]
Benefits:
- Failover: If the cluster leader crashes, a follower takes over via Raft election. The validator never misses its proposer slot.
- Load balancing: Followers serve read queries (CheckTx validation, pending count) at
Consistency::Weak. - Stream stability: All cluster nodes share the same tags. If the leader fails, stream connections remain on surviving nodes -- zero reconnection delay.
This extends CometBFT's existing sentry node topology: sentries protect against DDoS, but if the validator process crashes, manual intervention is needed. Raft clusters automate failover.
Implementation: zmanian/cometbft-mempool
Groupis notClone-- can't share group handles across tasks. Design one node as the group operator, others communicate via streams.- Create producers before discovery -- if peers discover each other before stream producers exist, catalog entries won't include stream IDs. Discovery sync will miss the streams.
- Tag propagation via
feed()is local-only -- other nodes need explicitsync_with()or wait for the gossip cycle (~15s) to see tag changes. Consistency::Strongqueries only work on the Raft leader -- useConsistency::Weakon followers after waiting for committed index to catch up.
| Aspect | CometBFT Status Quo | This Design |
|---|---|---|
| Tx routing | Flood gossip to all validators | Priority routing to proposer + next 2 |
| Bandwidth | O(validators * txs) | O(3 * txs) for proposer tier |
| Proposer mempool freshness | Depends on gossip convergence | Direct delivery, always freshest |
| Validator failover | Manual restart | Automatic Raft election (~500ms) |
| Tx durability | Independent per-node, can be lost | Raft-replicated within cluster |
| Leader transition | 2s+ timeout if proposer fails | Pre-connected, sub-500ms |
| Code change in CometBFT | N/A | ~200 lines (4 files modified) |
Commonware's Simplex consensus achieves 2-hop block times (vs CometBFT's 6-delta view changes) and threshold BLS signatures produce ~240 byte certificates regardless of validator set size. The same proposer-aware routing concept applies -- Simplex uses view-based leader rotation which is equally deterministic and pre-computable.
Implementation: zmanian/commonware-mempool
- Consensus protocol (Tendermint BFT / Simplex) is unchanged
- Block format, ABCI interface, state machine are unchanged
- ProcessProposal still works because block proposals include full transactions
- Safety: all validators still eventually receive all transactions (soft delay, not hard filter)
- Liveness: round-change fallback floods to all peers if proposer fails
- Peer-to-validator address mapping: CometBFT P2P node keys differ from validator addresses. The mapping is deployment-topology dependent and needs a pluggable resolver.
- Sentry node awareness: In production topologies where validators sit behind sentries, the priority routing needs to propagate through sentries or sentries need their own schedule awareness.
- ADR-118 mempool lanes interaction: How does proposer-aware routing compose with the upcoming lane-based mempool? Lanes could provide the priority mechanism natively.