denniswon · March 26, 2026 22:34 · denniswon · Apr 4, 2026 · denniswon · Apr 4, 2026
diff --git a/gistfile1.txt b/gistfile1.txt
 ❯ Load Test Report:
 Configuration:
  - Users: 10
  - Spawn rate: 10 users/sec (all spawn immediately)
  - Pacing: 15 seconds between requests per user
  - Duration: 5 minutes
  - Effective throughput: ~10 requests every 15 seconds (~40 req/min)
 HTTP Layer — all 200 requests succeeded, median 2.8s, p95 3.5s
 On-chain Submit (RPC → NewTaskCreated)
  - 141/200 seen (70.5%)
  - p50: 50s · p95: 84s · p99: 94s
 End-to-End (RPC → on-chain response)
  - 82/200 completed (41%)
  - p50: 138s · p95: 192s · p99: 204s
 Timeouts: 129/200 tasks (64.5%) hit the 180s response timeout
 Takeaway: Gateway is healthy (sub-4s HTTP). The bottleneck is on-chain — only 41% of tasks got a full response within 3 min. Median e2e for successful tasks was ~2.3 min with p50 submit latency at ~50s, pointing to backpressure at the tx
 submission layer.
 there is a serious reliability issue with transaction submission worker @crates/gateway/src/rpc/api/tx_worker.rs Deep analyze the architectural structure of @create/gateway focusing on the onchain task creation and task response submission
 flow and explore approaches to massively improve this reliability issue from the first principle. Follow guides and instructions from @.claude/
  ⎿  Read crates/gateway/src/rpc/api/tx_worker.rs (597 lines)
  ⎿  Listed directory .claude/

 ⏺ This is a critical performance issue. The load test data tells a clear story: the gateway's HTTP layer is fine (sub-4s), but the single-threaded TransactionSubmissionWorker is creating a fatal bottleneck — 59% of tasks never complete
  end-to-end.

  Let me deeply explore the architecture before proposing solutions.

 ⏺ 3 Explore agents finished (ctrl+o to expand)
   ├─ Explore tx submission flow · 35 tool uses · 114.9k tokens
   │  ⎿  Done
   ├─ Explore AvsWriter nonce mgmt · 22 tool uses · 118.9k tokens
   │  ⎿  Done
   └─ Explore gateway config and flow · 29 tool uses · 106.3k tokens
      ⎿  Done

 ❯ Load Test Report:
 Configuration:
  - Users: 10
  - Spawn rate: 10 users/sec (all spawn immediately)
  - Pacing: 15 seconds between requests per user
  - Duration: 5 minutes
  - Effective throughput: ~10 requests every 15 seconds (~40 req/min)
 HTTP Layer — all 200 requests succeeded, median 2.8s, p95 3.5s
 On-chain Submit (RPC → NewTaskCreated)
  - 141/200 seen (70.5%)
  - p50: 50s · p95: 84s · p99: 94s
 End-to-End (RPC → on-chain response)
  - 82/200 completed (41%)
  - p50: 138s · p95: 192s · p99: 204s
 Timeouts: 129/200 tasks (64.5%) hit the 180s response timeout
 Takeaway: Gateway is healthy (sub-4s HTTP). The bottleneck is on-chain — only 41% of tasks got a full response within 3 min. Median e2e for successful tasks was ~2.3 min with p50 submit latency at ~50s, pointing to backpressure at the tx
 submission layer.
 there is a serious reliability issue with transaction submission worker @crates/gateway/src/rpc/api/tx_worker.rs Deep analyze the architectural structure of @create/gateway focusing on the onchain task creation and task response submission
 flow and explore approaches to massively improve this reliability issue from the first principle. Follow guides and instructions from @.claude/ /clarify-plan

 ⏺ Skill(/clarify-plan)
  ⎿  Successfully loaded skill

 ⏺ Good — I've completed a deep exploration of the gateway's transaction submission architecture. Here's what I found before asking questions:

  ★ Insight ─────────────────────────────────────
  The core bottleneck is architectural, not configurational. The TransactionSubmissionWorker processes ALL on-chain transactions (both createNewTask AND respondToTask) through a single sequential loop — one tokio task, one mpsc channel,
  one-at-a-time processing. Each transaction blocks on get_receipt() (12-60s waiting for block confirmation). At 40 req/min generating 80 txs/min, but draining at ~2-4 txs/min, the queue is fundamentally oversubscribed by 20-40x.
  ─────────────────────────────────────────────────

  Here are my first three questions:

  1. Which chain was this load test running against, and what's the target chain for production?
  Block time matters enormously here — Base mainnet (2s blocks) vs Sepolia (12s) vs Base Sepolia (2s) changes the theoretical max throughput by 6x. And the nonce pipelining strategy differs by chain's mempool behavior.

  2. Is the single-signer constraint (one ECDSA key for both createNewTask and respondToTask) a hard requirement, or can the gateway operate with multiple signer keys?
  The current serialization exists solely to prevent nonce conflicts on a shared signer. If we can use N signers (e.g., a pool of hot wallets), we get N× parallelism trivially. If not, we need nonce pipelining (submit tx N+1 before tx N
  confirms), which is more complex but still achievable.

  3. What's the target throughput and latency you need?
  The load test hit 40 req/min and broke at 41% completion. Are we aiming for 40 req/min sustained with >99% completion? 100 req/min? 1000? And what's the acceptable e2e latency — is the current 2.8s HTTP response fine (since on-chain is async),
   or do callers need on-chain confirmation within a specific window?

 ✻ Brewed for 2m 59s

 ❯ 1. sepolia, for prod environment with 10 gateway replicas with 10 task generator signer private keys, 1 for each 2. not a hard requirement I think, but please review the code base (including the solidity contracts code for createNewTask and
 respondToTask to confirm my understanding), gateway can operate with multiple signer keys but all task generators have to be configured onchain (createNewTask, and responseToTask guarded with onlyTaskGenerator guard). If my understanding is not
 correct though, then nonce pipelining would be needed 3. we are targeting at least 100 RPS (100 requests per second). Acceptable e2e latency for offchain response is around 3 seconds or lower, not longer, ideally around 2 seconds. For onchain,
 we have more room, but definitely we want onchain e2e to happen by 1 min of request submission to gateway, ideally within 15-30 seconds (on Ethereum L1/sepolia)

 ⏺ 2 Explore agents finished (ctrl+o to expand)
   ├─ Explore TaskManager access control · 21 tool uses · 90.8k tokens
   │  ⎿  Done
   └─ Explore nonce pipelining feasibility · 28 tool uses · 109.1k tokens
      ⎿  Done

 ⏺ Excellent findings. Let me synthesize what we now know before asking the next batch of questions.

  ★ Insight ─────────────────────────────────────
  Multiple task generators are fully supported on-chain. The OperatorRegistry uses EnumerableSet.AddressSet — both createNewTask and respondToTask use onlyTaskGenerator, and Sepolia config already has 2 generators configured. The old aggregator
  role is deprecated. This means the multi-signer parallelism path is wide open — no nonce pipelining complexity needed.

  The math for 100 RPS on Sepolia (12s blocks): Each task needs 2 txs (create + respond). At 100 RPS, that's 200 txs/sec. With 12s block time, each signer can pipeline ~1 tx per block optimistically, or ~5 txs/block with nonce pipelining. With N
   signers doing round-robin, we need enough signers to absorb the throughput. But Sepolia's block gas limit and mempool behavior also constrain this.
  ─────────────────────────────────────────────────

  Batch 2 — Architecture & Deployment Constraints:

  4. You mentioned 10 gateway replicas with 10 task generator signer keys. Is each replica currently assigned exactly 1 signer key, or do all 10 replicas share the same key today?
  This determines whether we're solving a within-replica parallelism problem (single replica needs multiple signers) or a cross-replica coordination problem (multiple replicas competing for the same nonce space). The 10-key setup suggests you've
   already anticipated this — I want to confirm the current deployment topology.

  5. For the 100 RPS target — is that 100 newt_createTask calls/sec, or 100 total RPC calls/sec across all methods?
  At 100 task creations/sec on Sepolia (12s blocks), we'd need to land ~1,200 createNewTask + ~1,200 respondToTask = 2,400 txs per 12-second block. That's aggressive for a single chain. Are you planning to batch tasks, or is each RPC call = 1
  on-chain task?

  6. Is the respondToTask submission latency-critical, or only createNewTask?
  Currently both flow through the same sequential queue. If respondToTask can be more relaxed (e.g., batched every few seconds), we could prioritize createNewTask in a fast lane and batch responses separately — significantly reducing queue
  contention.

 ✻ Cooked for 2m 12s

 ❯ 4. each replica currently assigned exactly 1 signer key. But this can change if needed. For deploy pipeline for stagef vs prod, you can refer to @~/projects/deploy 5. 100 newt_createTask calls/sec 6. Neither has definitely latency requirement
 to happen, but for sure we dont want to incur too much delay or discrepancies between offchain task/task response vs onchain. And if possible, yes, we should explore batching mechanism so that we can for example batch submit task
 creation/response every interval (like that will be our "block/batch" time). 6. I initially made it same sequential queue in order to ensure that for a single task, task response NEVER gets submitted if the task has not been created
 successfully. There is no difference between createNewTask vs respondToTask in terms of being how much latency critical
  ⎿  Listed directory ../deploy/

 ⏺ Great — the picture is becoming very clear. Let me ask one final batch to nail down the solution design.

  ★ Insight ─────────────────────────────────────
  The math at 100 tasks/sec on Sepolia is extreme. 100 tasks/sec × 2 txs × 12s block time = 2,400 txs per block across the fleet. With 10 replicas at 1 signer each, that's 240 txs/block/replica. Individual transaction submission simply cannot
  scale to this — batching is mandatory. A BatchTaskManager contract registered as a task generator could batch N createNewTask calls into a single tx, reducing the problem by 1-2 orders of magnitude.
  ─────────────────────────────────────────────────

  Batch 3 — Solution Constraints:

  7. Is deploying a new batching contract (e.g., BatchTaskManager that calls taskManager.createNewTask in a loop) acceptable?
  This is the highest-leverage change. A contract registered as a task generator that batches 50-100 tasks per transaction would cut the tx count by 50-100x. Same for respondToTask. The alternative — submitting 240 individual txs per block per
  replica — is infeasible on any EVM chain regardless of nonce pipelining. Is adding a new contract to the deployment pipeline acceptable, or are we restricted to off-chain-only changes?

  8. For the create→respond ordering invariant: is it acceptable if creates and responds are in different batches submitted sequentially (batch of creates first, then batch of responds a few seconds later)?
  This preserves your invariant (no response without a successful create) while allowing independent batching windows. The tradeoff is slightly higher latency for the respond step (waits for the next batch interval).

  9. What's the gas budget per transaction on Sepolia/mainnet for your task generator signers?
  Batching 100 createNewTask calls in one tx could cost 20-50M gas. Sepolia's block gas limit is 30M. If individual createNewTask costs ~200-300K gas, we can fit ~100 per batch. But if it costs 500K+, we're limited to ~60. Do you know the
  approximate gas cost of a single createNewTask and respondToTask today?

 ✻ Crunched for 39s

 ❯ 7.  deploying a new batching contract (e.g., BatchTaskManager that calls taskManager.createNewTask in a loop) acceptable 8. yes acceptable 9. I do not, is there a way to estimate?

 ⏺ Let me estimate gas costs from the existing test suite and contract code.

 ⏺ Bash(cd /Users/dennis.won/projects/newton-prover-avs/contracts && forge test --match-test "test_createTask\|test_respondToTask\|test_createNewTask" --gas-report -vv…)
  ⎿  Compiling 150 files with Solc 0.8.27
     Solc 0.8.27 finished in 22.85s
     Compiler run successful with warnings:
     … +13 lines (ctrl+o to expand)
  ⎿  (timeout 2m)

 ⏺ Explore(Explore createNewTask gas usage)
  ⎿  Done (36 tool uses · 91.3k tokens · 59s)
  (ctrl+o to expand)

 ⏺ Good — gas estimates from code analysis:
  - createNewTask: ~100-150K gas
  - respondToTask: ~400-800K gas (BLS verification dominates)
  - Sepolia block gas limit: 30M

  ★ Insight ─────────────────────────────────────
  Batching math is very favorable. At 150K gas/create, we can fit ~200 creates per batch tx. At 800K gas/respond, ~37 responds per batch. Per replica at 10 tasks/sec over a 12s block: 120 creates (1 batch tx) + 120 responds (3-4 batch txs) = ~5
  txs per block per signer. That's trivially achievable. Batching turns a 240-tx-per-block problem into a 5-tx-per-block problem.
  ─────────────────────────────────────────────────

  I now have enough context to design the solution. Let me move to the Design phase.

 ⏺ Plan(Design batch tx submission plan)
  ⎿  Done (49 tool uses · 99.0k tokens · 3m 32s)
  (ctrl+o to expand)

 ⏺ The plan from the design agent is comprehensive. Let me verify one critical detail before writing the final plan.
  ⎿  Loaded ../.claude/rules/solidity.md

  Searched for 1 pattern, read 1 file (ctrl+o to expand)

 ⏺ Now I have the full picture. Let me read the rest of respondToTask to confirm the batch contract's parameter requirements, then write the final plan.

  Read 1 file (ctrl+o to expand)

 ⏺ Good — I have the full picture. Now let me write the final plan.

 ⏺ Updated plan
  ⎿  /plan to preview

 ⏺ The plan is written. Here's the summary:

  ★ Insight ─────────────────────────────────────
  The key insight is that batching transforms the problem entirely. Instead of 240 individual transactions competing for block space per replica per block (impossible), we get 5 batch transactions per block (trivial). The BatchTaskManager
  contract is the highest-leverage change — it reduces the on-chain tx count by ~50x. The Rust-side batch accumulator then collects tasks into time-windowed batches (2s intervals), and nonce pipelining ensures those 5 txs don't block
  sequentially on receipt confirmation. The entire solution is feature-gated behind batch_submitter.enabled, so the old sequential worker stays as fallback.

  The ordering invariant (no respondToTask before createNewTask confirms) is preserved via a DashMap<TaskId, ()> shared between the create and respond accumulators — responds are held until their corresponding create is confirmed on-chain.
  ─────────────────────────────────────────────────

  4 layers, 6 implementation steps:
  1. BatchTaskManager.sol — batch N calls into 1 tx (Solidity, no Rust dependency)
  2. AvsWriter batch methods — Rust bindings to call the batch contract
  3. BatchTransactionSubmitter — replaces tx_worker.rs with time-windowed batch accumulation
  4. Nonce pipelining — submit 5 batch txs concurrently, await receipts in parallel
  5. Integration testing + load test at 100 tasks/sec
  6. Multi-signer pool (only if needed — math says it won't be)

 ⏺ User approved Claude's plan
  ⎿  Plan saved to: ~/.claude/plans/valiant-giggling-moler.md · /plan to edit
     Plan: High-Throughput Batch Transaction Submission

     Context

     Load testing at just 40 req/min revealed a critical bottleneck: TransactionSubmissionWorker processes ALL on-chain transactions (both createNewTask and respondToTask) through a single sequential loop, blocking on get_receipt() (12-60s per
     tx on Sepolia). Only 41% of tasks completed end-to-end; 64.5% hit the 180s timeout.

     Target: 100 tasks/sec sustained across 10 gateway replicas (10/sec/replica), with <3s offchain response and <30s on-chain e2e on Sepolia (12s blocks).

     Root cause: The worker can process ~2-4 txs/min but needs 200 txs/sec. Throughput gap: ~3000x.

     Solution: Four coordinated layers — a Solidity batch contract, a Rust batch accumulator, nonce pipelining, and an optional multi-signer pool.

     ---
     Layer 1: BatchTaskManager Solidity Contract

     What

     A new standalone contract that batches N createNewTask and N respondToTask calls into single transactions. Registered as a task generator on OperatorRegistry.

     Why

     At 150K gas/create, one batch tx fits ~200 creates (30M block gas limit). At 800K gas/respond, ~37 per batch. Per replica per 12s block at 10 tasks/sec: 120 creates (1 batch tx) + 120 responds (3-4 batch txs) = 5 txs per block instead of
     240. This is the highest-leverage change.

     Design

     Error semantics: try/catch per item (not revert-all). A single TaskAlreadyExists or TaskAlreadyResponded in a batch of 120 must not revert the other 119.

     File: contracts/src/BatchTaskManager.sol

     contract BatchTaskManager {
         INewtonProverTaskManager public immutable taskManager;

         // Per-item result events
         event BatchCreateResult(bytes32 indexed taskId, bool success, bytes reason);
         event BatchRespondResult(bytes32 indexed taskId, bool success, bytes reason);

         constructor(address _taskManager) {
             taskManager = INewtonProverTaskManager(_taskManager);
         }

         function batchCreateTasks(
             INewtonProverTaskManager.Task[] calldata tasks
         ) external returns (uint256 successCount) {
             for (uint256 i; i < tasks.length;) {
                 try taskManager.createNewTask(tasks[i]) {
                     emit BatchCreateResult(tasks[i].taskId, true, "");
                     unchecked { ++successCount; }
                 } catch (bytes memory reason) {
                     emit BatchCreateResult(tasks[i].taskId, false, reason);
                 }
                 unchecked { ++i; }
             }
         }

         function batchRespondToTasks(
             INewtonProverTaskManager.Task[] calldata tasks,
             INewtonProverTaskManager.TaskResponse[] calldata responses,
             bytes[] calldata signatureDataArray
         ) external returns (uint256 successCount) {
             require(tasks.length == responses.length && responses.length == signatureDataArray.length);
             for (uint256 i; i < tasks.length;) {
                 try taskManager.respondToTask(tasks[i], responses[i], signatureDataArray[i]) {
                     emit BatchRespondResult(responses[i].taskId, true, "");
                     unchecked { ++successCount; }
                 } catch (bytes memory reason) {
                     emit BatchRespondResult(responses[i].taskId, false, reason);
                 }
                 unchecked { ++i; }
             }
         }
     }

     Access control: The BatchTaskManager address is registered as a task generator via OperatorRegistry.addTaskGenerator(). The contract itself needs no auth modifier — only registered task generators can call taskManager.createNewTask (the
     contract is the msg.sender to TaskManager). Optionally add an onlyAuthorized mapping to prevent gas griefing from arbitrary callers.

     Files to create/modify

     ┌─────────────────────────────────────────────────┬────────────────────────────────────────────────────────────┐
     │                      File                       │                           Action                           │
     ├─────────────────────────────────────────────────┼────────────────────────────────────────────────────────────┤
     │ contracts/src/BatchTaskManager.sol              │ Create — batch contract                                    │
     ├─────────────────────────────────────────────────┼────────────────────────────────────────────────────────────┤
     │ contracts/src/interfaces/IBatchTaskManager.sol  │ Create — interface for Rust bindings                       │
     ├─────────────────────────────────────────────────┼────────────────────────────────────────────────────────────┤
     │ contracts/test/BatchTaskManager.t.sol           │ Create — unit tests                                        │
     ├─────────────────────────────────────────────────┼────────────────────────────────────────────────────────────┤
     │ contracts/script/BatchTaskManagerDeployer.s.sol │ Create — deployment script following DeploymentLib pattern │
     └─────────────────────────────────────────────────┴────────────────────────────────────────────────────────────┘

     Deployment steps

     1. Deploy BatchTaskManager with existing TaskManager address
     2. Call OperatorRegistry.addTaskGenerator(batchTaskManagerAddress)
     3. Write deployment to script/deployments/batch-task-manager/{chainId}-{env}.json
     4. Run make generate-bindings to auto-generate Rust types in crates/core/src/generated/

     ---
     Layer 2: Batch Accumulator (Rust — replaces tx_worker.rs)

     What

     Replace the sequential TransactionSubmissionWorker with BatchTransactionSubmitter that accumulates tasks into time-windowed batches and submits via the BatchTaskManager contract.

     Architecture

       sync.rs ─────► CreateBatch Accumulator ──flush every 2s──► batchCreateTasks() on-chain
                             │
                      (tracks confirmations via DashMap)
                             │
       sync.rs ─────► RespondBatch Accumulator ─flush after creates confirm──► batchRespondToTasks()

     Two independent accumulator loops as background tokio tasks, connected by a shared DashMap<TaskId, ()> that tracks which creates have been confirmed on-chain.

     Batch flush triggers

     1. Timer: batch_interval_ms (default 2000ms — one batch per ~1 block on Sepolia)
     2. Size: batch reaches max_batch_size
     3. Both accumulator loops run independently

     Ordering invariant

     The respond accumulator holds items in a pending queue. On each flush tick, it moves items whose task_id appears in the confirmed_creates: DashMap<TaskId, ()> into the current respond batch. Items waiting longer than respond_wait_timeout_ms
      (default 60s) are dropped with an error log.

     Public interface (backward compatible)

     BatchTransactionSubmitter exposes the same methods as TransactionSubmissionWorker:
     - submit_create_task(task_id, params) -> oneshot::Receiver<Result<...>>
     - submit_aggregated_response(task_id, task_request, service_response) -> Result<()>
     - queue_capacity() -> usize / queue_depth() -> usize

     This means sync.rs and handler/mod.rs only swap the type name.

     Config

     pub struct BatchSubmitterConfig {
         pub enabled: bool,                     // default: false (feature-gated rollout)
         pub max_create_batch_size: usize,      // default: 120
         pub max_respond_batch_size: usize,     // default: 30
         pub batch_interval_ms: u64,            // default: 2000
         pub respond_wait_timeout_ms: u64,      // default: 60000
         pub batch_contract_address: Address,   // from deployment JSON
     }

     Files to create/modify

     ┌───────────────────────────────────────────────┬────────────────────────────────────────────────────────────────────────────────────────┐
     │                     File                      │                                         Action                                         │
     ├───────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────┤
     │ crates/gateway/src/rpc/api/batch_submitter.rs │ Create — new batch accumulator                                                         │
     ├───────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────┤
     │ crates/gateway/src/rpc/api/mod.rs             │ Modify — add mod batch_submitter export                                                │
     ├───────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────┤
     │ crates/gateway/src/config.rs                  │ Modify — add BatchSubmitterConfig to GatewayConfig                                     │
     ├───────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────┤
     │ crates/gateway/src/handler/mod.rs             │ Modify — instantiate BatchTransactionSubmitter when enabled (lines ~549, ~856)         │
     ├───────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────┤
     │ crates/chainio/src/avs/writer.rs              │ Modify — add batch_create_tasks() and batch_respond_to_tasks() methods                 │
     ├───────────────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────┤
     │ crates/gateway/src/task/submitter.rs          │ Modify — add batch methods to TaskSubmitter trait (or create BatchTaskSubmitter trait) │
     └───────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────┘

     Migration path

     1. Phase A: Ship behind batch_submitter.enabled = false. Old TransactionSubmissionWorker remains default.
     2. Phase B: Enable on stagef, load test. Flip default to true.
     3. Phase C: Remove TransactionSubmissionWorker after production validation.

     ---
     Layer 3: Nonce Pipelining

     What

     After batching reduces txs from 240/block to ~5/block, we still need those 5 txs submitted without blocking on each receipt sequentially. Split send() from get_receipt().

     Design

     In the batch flush path:

     // Fire phase: submit all batch txs with incrementing nonces
     let pending_txs: Vec<PendingTransaction> = vec![];
     for batch in batches {
         let pending = batch_contract.batchCreateTasks(batch).send().await?;
         pending_txs.push(pending);
     }

     // Await phase: concurrently wait for all receipts
     let receipts = futures::future::join_all(
         pending_txs.into_iter().map(|tx| tx.get_receipt())
     ).await;

     Alloy's nonce manager auto-increments on each send() call, so nonces [N, N+1, N+2, ...] are assigned correctly without manual tracking.

     Recovery

     If tx at nonce N fails after N+1 is in mempool: N+1 is stuck. The recovery path:
     1. Detect the failure from the receipt
     2. Re-submit N with bumped gas (replacement tx)
     3. Or cancel N with a zero-value self-transfer and re-submit the batch

     Files to create/modify

     ┌───────────────────────────────────────────────┬──────────────────────────────────────────────────────────────────────┐
     │                     File                      │                                Action                                │
     ├───────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┤
     │ crates/chainio/src/avs/writer.rs              │ Modify — add send_without_receipt() variant for pipelined submission │
     ├───────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┤
     │ crates/gateway/src/rpc/api/batch_submitter.rs │ Modify — use pipelined send in flush logic                           │
     └───────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────┘

     ---
     Layer 4: Multi-Signer Pool (Optional — only if needed)

     What

     Round-robin across N signer keys per replica. Each signer has independent nonce space.

     When to implement

     Only if load testing after Layers 1-3 shows a single signer can't keep up. The math says 5 txs/block is well within single-signer capacity, so this is likely unnecessary.

     Design

     pub struct SignerPool {
         signers: Vec<Arc<AvsWriter>>,
         next: AtomicUsize,
     }

     All signer addresses call the same BatchTaskManager contract (which is the registered task generator). No on-chain changes needed beyond Layer 1.

     Files

     ┌───────────────────────────────────┬────────────────────────────────────────────────────────┐
     │               File                │                         Action                         │
     ├───────────────────────────────────┼────────────────────────────────────────────────────────┤
     │ crates/gateway/src/signer_pool.rs │ Create (only if needed)                                │
     ├───────────────────────────────────┼────────────────────────────────────────────────────────┤
     │ crates/gateway/src/config.rs      │ Modify — add additional_signers to TaskGeneratorConfig │
     └───────────────────────────────────┴────────────────────────────────────────────────────────┘

     ---
     Expected Performance

     ┌────────────────────────┬──────────────────┬────────────────────────────────────────┐
     │         Metric         │     Current      │               After Fix                │
     ├────────────────────────┼──────────────────┼────────────────────────────────────────┤
     │ Offchain response      │ 2.8s (OK)        │ 2.8s (unchanged)                       │
     ├────────────────────────┼──────────────────┼────────────────────────────────────────┤
     │ On-chain task creation │ p50=50s          │ ~14s (2s accumulate + 12s block)       │
     ├────────────────────────┼──────────────────┼────────────────────────────────────────┤
     │ On-chain e2e           │ p50=138s         │ ~28s (create confirm + 2s + 12s block) │
     ├────────────────────────┼──────────────────┼────────────────────────────────────────┤
     │ Completion rate        │ 41%              │ >99%                                   │
     ├────────────────────────┼──────────────────┼────────────────────────────────────────┤
     │ Max throughput/replica │ ~2-4 txs/min     │ 10 tasks/sec (120/block)               │
     ├────────────────────────┼──────────────────┼────────────────────────────────────────┤
     │ Txs per block/replica  │ 240 (impossible) │ 5 (trivial)                            │
     └────────────────────────┴──────────────────┴────────────────────────────────────────┘

     ---
     Implementation Sequence

     Step 1: BatchTaskManager contract (no Rust dependency)

     1. Create contracts/src/BatchTaskManager.sol and interface
     2. Create contracts/test/BatchTaskManager.t.sol with tests:
       - All succeed, partial failure, auth checks, gas measurement
     3. Create deployment script following DeploymentLib pattern
     4. forge test && forge fmt
     5. Run make generate-bindings

     Step 2: AvsWriter batch methods (depends on Step 1 bindings)

     1. Add batch_create_tasks() and batch_respond_to_tasks() to AvsWriter
     2. Unit tests for the new methods

     Step 3: BatchTransactionSubmitter (depends on Step 2)

     1. Implement BatchTransactionSubmitter in batch_submitter.rs
     2. Add BatchSubmitterConfig to GatewayConfig
     3. Wire into handler/mod.rs behind enabled flag
     4. Unit tests for accumulator logic, ordering invariant, flush triggers

     Step 4: Nonce pipelining (depends on Step 3)

     1. Add pipelined send to the flush path
     2. Add nonce gap recovery logic

     Step 5: Integration testing

     1. Update E2E tests to exercise batch path
     2. Load test: 100 tasks/sec sustained for 5 minutes
     3. Verify ordering invariant
     4. Verify partial batch failure handling

     Step 6: Multi-signer pool (only if Step 5 load test shows need)

     ---
     Verification

     1. forge test — all BatchTaskManager contract tests pass
     2. make clippy — no new warnings
     3. make tests — existing unit tests unaffected (feature-gated)
     4. make gateway_e2e — existing E2E still works with old worker (default)
     5. make gateway_e2e with BATCH_SUBMITTER_ENABLED=true — E2E works with new batch path
     6. Load test at 100 tasks/sec for 5 min: >99% completion, <30s on-chain e2e

     ---
     Risks and Mitigations

     ┌──────────────────────────────────────────┬──────────────────────────────────────────────────────────────────────────────┐
     │                   Risk                   │                                  Mitigation                                  │
     ├──────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────┤
     │ Batch contract deployment fails          │ Old sequential worker remains as fallback (feature flag)                     │
     ├──────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────┤
     │ Partial batch failure loses tasks        │ try/catch per item; failures logged with task ID for retry                   │
     ├──────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────┤
     │ Nonce gap from failed pipeline tx        │ Re-query nonce, re-submit remaining; fallback to sequential                  │
     ├──────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────┤
     │ Respond submitted before create confirms │ DashMap gating enforces ordering; items time out rather than submit early    │
     ├──────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────┤
     │ Gas estimation wrong for large batches   │ Conservative defaults (120 creates, 30 responds); configurable per env       │
     ├──────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────────────┤
     │ Offchain response latency regresses      │ Batch accumulator is completely decoupled from RPC response path — no impact │
     └──────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────┘
No results found