Research based on analysis of Monad L1 execution codebase.
Monad achieves high throughput through several key techniques that can be adopted by Ethereum clients:
| Technique | Expected Impact |
|---|---|
| Segregated I/O rings | +15-20% RPC throughput |
| Memory-bounded node cache | +10-15% cache hit rate |
| Multi-pool RPC executor | +25-40% p99 latency reduction |
| Parallel signature recovery | 5-10x faster block preparation |
| VersionStack rollback | O(1) reorg cost vs O(n) state cloning |
| Zone-aware NVMe storage | 2-4x faster cold reads |
| Fiber-based execution | 4-8x more concurrency per thread |
Source: category/mpt/node_cache.hpp
Monad tracks actual memory usage in bytes, not just entry counts:
class NodeCache final {
size_t max_bytes_;
size_t used_bytes_{0}; // Tracks real memory consumption
void evict_until_under_limit() {
while (used_bytes_ > max_bytes_ && !active_list_.empty()) {
auto const list_it = std::prev(active_list_.end());
used_bytes_ -= list_it->val.second; // Subtract actual size
// evict...
}
}
};Why it matters: Prevents OOM crashes while maximizing cache utility. Variable node sizes (~104 bytes average) make count-based limits unreliable.
Adoption for Reth:
struct NodeCache {
max_bytes: usize,
used_bytes: AtomicUsize,
entries: DashMap<B256, (Node, usize)>, // (node, size_in_bytes)
}Adoption for Geth:
type NodeCache struct {
maxBytes uint64
usedBytes atomic.Uint64
entries sync.Map // hash -> (node, sizeBytes)
}Source: category/mpt/db.hpp
Monad uses separate io_uring instances for reads vs writes:
struct AsyncIOContext {
io::Ring read_ring; // Dedicated read ring
std::optional<io::Ring> write_ring; // Separate write ring
async::AsyncIO io;
};Why it matters: Write-heavy workloads (state commits, receipts) don't block read operations (RPC queries).
Adoption for Reth (using io-uring crate):
struct AsyncDbContext {
read_ring: IoUring,
write_ring: IoUring,
read_buffer_pool: RegisteredBufferPool,
}Source: category/async/storage_pool.hpp
Monad leverages Linux zonefs for sequential write zones:
- Conventional zones: ~70μs latency (block device emulation)
- Sequential zones: ~15-30μs latency (direct access, no FTL overhead)
- Automatic fallback: Emulates zones using 256MB chunks if zonefs unavailable
Adoption: Check if NVMe supports ZNS, use zonefs mount for state storage. Falls back gracefully on regular SSDs.
Source: category/async/io.hpp
Zero-allocation I/O via pre-registered buffers:
// Config from ondisk_db_config.hpp
unsigned rd_buffers{1024}; // 1024 * 8KB = 8MB pool
unsigned uring_entries{128}; // Concurrent I/O ops
unsigned concurrent_read_io_limit{600}; // Max inflight readsWhy it matters: Eliminates kernel page table lookups during I/O. Each registered buffer can be used directly by the kernel.
Source: category/rpc/monad_executor.h
Monad uses three separate fiber pools based on operation cost:
struct monad_executor_pool_config {
unsigned num_threads; // OS threads in pool
unsigned num_fibers; // Lightweight fibers per thread
unsigned timeout_sec; // Queue timeout
unsigned queue_limit; // Max queued requests
};
// Three pools:
// 1. low_gas_pool: eth_call < 8.1M gas (low latency)
// 2. high_gas_pool: eth_call >= 8.1M gas (higher latency tolerance)
// 3. trace_block_pool: debug_traceBlock (highest latency tolerance)Why it matters: Small eth_call requests shouldn't queue behind trace operations.
Adoption for Reth:
struct RpcExecutor {
low_gas_pool: ThreadPool, // eth_call < 8.1M gas
high_gas_pool: ThreadPool, // eth_call >= 8.1M gas
trace_pool: ThreadPool, // debug_* methods
}
impl RpcExecutor {
fn route_call(&self, gas_limit: u64) -> &ThreadPool {
if gas_limit < 8_100_000 {
&self.low_gas_pool
} else {
&self.high_gas_pool
}
}
}Adoption for Geth:
type RPCExecutor struct {
lowGasPool *WorkerPool // eth_call < 8.1M gas
highGasPool *WorkerPool // eth_call >= 8.1M gas
tracePool *WorkerPool // debug_traceBlock
}
func (e *RPCExecutor) Submit(req *RPCRequest) {
switch {
case req.IsTrace():
e.tracePool.Submit(req)
case req.GasLimit < 8_100_000:
e.lowGasPool.Submit(req)
default:
e.highGasPool.Submit(req)
}
}Source: category/rpc/monad_executor.h
Monad applies state overrides in-place without cloning entire state:
void add_override_address(struct monad_state_override *, uint8_t const *addr);
void set_override_balance(struct monad_state_override *, uint8_t const *addr,
uint8_t const *balance);
void set_override_state_diff(struct monad_state_override *, uint8_t const *addr,
uint8_t const *key, uint8_t const *value);Why it matters: Enables concurrent eth_call simulations with 0% heap allocation overhead.
Source: category/execution/ethereum/state3/version_stack.hpp
template <class T>
class VersionStack {
std::deque<std::pair<unsigned, T>> stack_; // (version, value) pairs
T ¤t(unsigned const version) {
if (version > stack_.back().first) {
T value = stack_.back().second;
stack_.emplace_back(version, std::move(value));
}
return stack_.back().second;
}
void pop_accept(unsigned const version) {
// Efficiently merge consecutive versions
if (version == stack_.back().first) {
if (size > 1 && stack_[size-2].first + 1 == stack_[size-1].first) {
stack_[size-2].second = std::move(stack_[size-1].second);
stack_.pop_back();
}
}
}
};Uses immutable persistent data structures (immer library) for logs:
VersionStack<immer::vector<Receipt::Log>> logs_; // Structural sharingWhy it matters: O(1) rollback instead of O(n) state cloning during reorgs.
Adoption for Reth:
use im::Vector; // Persistent immutable vector
struct VersionStack<T: Clone> {
stack: VecDeque<(u32, T)>, // (version, value)
}
// Use im::HashMap for state, im::Vector for logs
type Logs = im::Vector<Log>;Map<Address, OriginalAccountState> original_{}; // Initial block state
Map<Address, VersionStack<AccountState>> current_{}; // Modified stateEnables efficient cold/warm storage access distinction for accurate gas calculations.
Source: category/execution/ethereum/execute_block.cpp
std::vector<std::optional<Address>> recover_senders(
std::span<Transaction const> const transactions,
fiber::PriorityPool &priority_pool) {
std::vector<std::optional<Address>> senders{transactions.size()};
auto promises = std::make_shared<boost::fibers::promise<void>[]>(
transactions.size());
for (unsigned i = 0; i < transactions.size(); ++i) {
priority_pool.submit(i, [i, &senders, &transactions, promises] {
senders[i] = recover_sender(transactions[i]);
promises[i].set_value();
});
}
for (unsigned i = 0; i < transactions.size(); ++i) {
promises[i].get_future().wait();
}
return senders;
}Why it matters: 5-10x faster block preparation by parallelizing ecrecover.
Adoption for Reth:
async fn recover_senders(txs: &[Transaction]) -> Vec<Option<Address>> {
let futures: Vec<_> = txs.iter()
.map(|tx| tokio::spawn(async move { tx.recover_signer() }))
.collect();
join_all(futures).await
.into_iter()
.map(|r| r.ok().flatten())
.collect()
}Source: category/mpt/traverse.hpp
bool traverse(
NodeCursor const &, TraverseMachine &, uint64_t block_id,
size_t concurrency_limit = 4096); // Max 4096 concurrent readsWhy it matters: Prevents I/O storms while enabling parallelism.
Source: category/core/fiber/fiber_thread_pool.hpp
- ~100KB stack per fiber vs ~2MB per OS thread
- Cooperative scheduling within OS thread
- Work stealing via TBB concurrent_priority_queue
Adoption for Reth: Use tokio::task with custom executor for priority scheduling.
Adoption for Geth: Use bounded goroutine pools with priority channels.
Monad uses ankerl::unordered_dense::segmented_map everywhere:
- Cache-friendly iteration
- Lower memory fragmentation
- Faster for most Ethereum workloads
Adoption for Reth: Use hashbrown with custom hasher.
Adoption for Geth: Use swiss map implementations.
From Monad's ondisk_db_config.hpp:
struct OnDiskDbConfig {
bool eager_completions{false}; // Poll completions eagerly
unsigned concurrent_read_io_limit{1024}; // Max inflight reads
unsigned uring_entries{512}; // Submission queue size
uint64_t node_lru_max_mem{100ul << 20}; // 100MB L1 cache
};- CPU: x86-64-v3 minimum (Haswell+) for crypto ops
- RAM: 32GB+ (100MB L1 cache + 8GB+ buffer pools)
- Storage: NVMe with zonefs if available
- Network: Separate NICs for consensus vs RPC if possible
For maximum impact with minimal effort:
- Multi-pool RPC executor - Easy to implement, immediate p99 improvement
- Parallel signature recovery - Drop-in optimization
- Memory-bounded cache - Prevents OOM, improves stability
- Segregated I/O rings - Requires io_uring refactor but high payoff
- VersionStack pattern - Requires state layer changes