Skip to content

Instantly share code, notes, and snippets.

@markerikson
Created April 17, 2026 01:55
Show Gist options
  • Select an option

  • Save markerikson/4e324c2a5c49c5b8e1c99615155c42ab to your computer and use it in GitHub Desktop.

Select an option

Save markerikson/4e324c2a5c49c5b8e1c99615155c42ab to your computer and use it in GitHub Desktop.
Draft: How Replay Helped Find a React Bug Faster Than Dan Abramov Did

Replay Case Study: How Replay Helped Find a React Bug Faster than Dan Abramov Did

Authors: Mark Erikson Cover photo: No Date: April 17, 2026 hidden: Yes tags: Case studies, Story

We’ve always said that Replay time-travel recordings enable developers to solve really hard bugs, by giving them the ability to inspect the app’s behavior at any point in time. This is especially true for timing bugs and race conditions, where it may be impossible to actually debug them with standard developer tools that would block the race condition from even happening. Now, with Replay MCP, we’re giving agents those same time-travel investigation superpowers.

I recently tried using Replay MCP to investigate a complex React internals bug that Dan Abramov had already investigated and fixed via an agent, but only after a month of investigation and false trails. Would Replay enable an agent to find the right answer? And how long would it take?

What are Replay time-travel recordings?

Replay is a time-travel debugger for web apps. The Replay recording browser captures DVR-style recordings of an entire web app. Because we capture the entire browser’s behavior and inputs, we can replay the app’s execution exactly the way it ran during the original recording. Unlike session replay or prod monitoring tools, Replay lets you do time-travel debugging and inspect the app’s behavior at any point in time: see which lines of code executed, evaluate log statements for every time a line ran, view the DOM tree and React component tree at any point in time, and much more. This makes it possible to investigate and solve bugs in ways no other tool can.

What is Replay MCP?

Replay DevTools is our debugging UI for humans - browser devtools with time-travel built in. Replay MCP gives agents those same time-travel debugging capabilities. Agents can open a recording and use the MCP tools to do the same time-travel debugging a human would: looking at console messages, adding logpoints to evaluate expressions each time a line of code ran, getting screenshots and stack traces, and getting framework-specific insights into libraries like React, Redux, Zustand, and TanStack Query. This means agents can now do the investigation work for you automatically!

Background: The React useDeferredValue Bug

In early February, former React core team member and well-known React expert Dan Abramov filed React issue #35821: useDeferredValue gets stuck with a stale value . He reported that he had seen useDeferredValue get “stuck” in prod builds and never re-render with the updated result. He included a fairly minimal repro that had two textboxes. Type in the first textbox, the entries got sent to the server and reflected into the second textbox via useDeferredValue. This worked fine in dev build, but could semi-consistently repro and get stuck in prod builds.

A month later, Dan filed React PR #36134: Fix useDeferredValue getting stuck . The actual fix was 4 lines of code deep in ReactFiberWorkLoop.js to ensure React’s internal “lanes” data structure got updated properly.

Interestingly, the PR itself was described as “Written/debugged by Claude”.

Dan’s Agent Investigation Thoughts

After filing the PR, Dan posted his thoughts on the investigation process on Bluesky :

i tried to get Claude to fix the PR a month ago and it spent a ton of tokens jumping between wrong conclusions, unable to even write a failing test. hopeless. Today Andrew Clark gave me a tip which led to a fix in just five minutes.

the tip was to tell claude to instrument react production bundle with logs. that's all that was really needed

my workflow was:

  • tell it to repro the bug with my real repro via plawright
  • then tell it to study react source code and relevant parts (which i suggested)
  • add logs to repro
  • fix

this is just to say that the most valuable thing it needs to find bugs is information over time, especially if you're dealing with some mutable system. with the right log granularity and a high level idea of what algorithm should be doing, it'll find it

i think there's something to this particular sequence of

  • reliably confirm the repro
  • study the relevant code (without looking for the bug yet)
  • instrument the repro with logs
  • only THEN, try to pinpoint the problem
  • red/green workflow to verify fix

will try it again in the future

This is exactly why Replay exists, and why we’ve built Replay MCP!

Once you have a Replay recording of a bug, you can investigate it as much and as deeply as you want. The runtime execution becomes data you can query. When did React render, and why? How many times did a given line of code execute? What was the value of x every time this line of code ran?

Given that, I wanted to compare how well a standard AI agent could investigate the same bug, given only the bug report and access to Replay recordings of the issue.

Investigation Process and Setup

For this experiment, I used my own personal agent setup: OpenCode 1.4 and Opus 4.6. I have some file search and context management plugins enabled, but otherwise no specific skills or custom behaviors.

In other investigations, I’ve found that the context and investigation prompts have a huge influence on the results: telling an agent how deep to go, giving it directions on scientific method steps, providing context on the available codebase.

I did an initial run to see if the agent could even get close to the correct answer. I was thrilled to see that just by analyzing the Replay recordings, the agent successfully identified the root cause and the fix in under 10 minutes!

That alone is an amazing result. As Dan described, his own agent took “a month” and “a ton of tokens”, and even then his agent couldn’t solve the problem until he got a hint from Andrew Clark. In comparison, just having Replay recordings available to investigate was enough for an agent to solve it right away!

With that as a baseline, I set up a proper experiment: how much do prompts and investigation instructions matter? I kicked off four parallel agent investigation sessions. Each was given access to the recordings and demo app source, but with varying prompt instructions:

  1. Light details, bug repro, only told to “write a bug report with a root cause and suggested fix”
  2. Same repro steps, but a detailed 8-step investigation methodology including explaining why the problem is happening
  3. Additional summary of React’s internal scheduling system concepts
  4. Additional list of Replay MCP Tools and their purposes

How would they do? What differences would we see between them in results or investigation times?

Investigation Session Results

I was thrilled to see that with Replay recordings of the bug available, all 4 agent sessions nailed the actual root cause and produced valid suggested fixes, in under 30 minutes!

Here’s how they tackled the investigation.

Agent #1 (Basic Instructions): 28 Minutes

Agent #1 was given the least context and instructions, so it spent the most time trying to orient itself and understand React’s internals. It also went down the most rabbit holes chasing false leads :)

After starting with the RecordingOverview tool and seeing the prod recording ending with a render commit mismatch and no SuspenseResumed commit, it dug into the implementation of useDeferredValue . This got stuck on promise semantics for several minutes before concluding it was a scheduling problem.

It used the Logpoint tool in pingSuspendedRoot and confirmed there were ping issues. It took another 10 minutes of tracing through RSC promise resolution, including checking call stacks to confirm portions of the call stack were synchronous, before it found the relevant Suspense ternary condition and analyzed that.

It ultimately proposed three fixes: fixing the pingSuspendedRoot ternary (matching PR #36134), making pings async, and updating pinged lanes after renders complete.

This was the longest session, but it did a remarkably thorough job of tracing through the complexity of React’s internals and using Replay MCP Logpoints and other tools to understand what was going on.

Agent #1 prompt

You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.

Bug Report

App: Next.js app using React 19 with server actions. Source at ~/projects/temp/react-udv-bug.

Behavior:

  • A textarea input feeds text to a server action (renderAction) which returns Promise<ReactNode>
  • The promise is wrapped in useDeferredValue, then consumed via use() inside <Suspense>
  • In dev mode (next dev): the deferred output box tracks the input and catches up after typing stops
  • In prod mode (next build && next start): the deferred output frequently gets stuck and never catches up to the final input value

Recordings:

  • Dev (working): 06447062-f403-49f3-8bd8-476a65b448d2
  • Prod (broken): c0b7c6fb-8f2a-461c-9fca-0ba2e9be3083

In the broken prod recording, the user typed "hello world from prod" but the deferred output is stuck showing "hello world from pro" — one character behind, permanently.

What we know:

  • No console errors in either recording
  • The server action POST requests all return 200 OK with valid data
  • The app code is identical between dev and prod — only the build mode differs
  • It only manifests in production builds

Goal: Figure out why the deferred value stops updating in production and identify the root cause.

Deliverable

Write a detailed bug report with:

  1. Bug Description — What happens
  2. Root Cause Analysis — What exactly causes the deferred value to get stuck in prod but not dev
  3. Suggested Fix — What code change would fix this
  4. Key Evidence Trail — Table of tools used and findings at each step
Agent #1 final analysis

Bug Report: useDeferredValue Permanently Stalls in React 19 Production with Server Action RSC Payloads


1. Bug Description

When using useDeferredValue with a promise returned from a server action (the useDeferredValue(promise) pattern), the deferred output frequently gets permanently stuck in production builds. The deferred value stops updating and never catches up to the final input.

Repro: A textarea input calls renderAction(value) (a server action returning Promise<ReactNode>), wraps it in useDeferredValue, then consumes via use() inside <Suspense>. In dev mode, the deferred output tracks correctly. In prod, it stalls — e.g., user types "hello world from prod" but deferred output shows "hello world from pro" forever.

Affected version: React 19.3.0-canary (via Next.js 16.1.6 production build)

2. Root Cause Analysis

This is a race condition in React's concurrent renderer where a synchronous ping from an RSC lazy chunk fires during an active render, causing the deferred lane to be permanently lost.

The Bug Mechanism

The failure occurs in pingSuspendedRoot (react-dom-client.production.js L13073-13089) when three conditions converge simultaneously:

  1. root.suspendedLanes = 0 — The lane hasn't been marked as suspended yet because markRootSuspended runs after the render completes, but the ping fires during the render
  2. executionContext & 2 is truthy — The ping fires inside the render context
  3. workInProgressRootExitStatus = 4 (RootSuspended) — Set by renderDidSuspendDelayIfPossible in throwException, before attachPingListener is called

This creates a double failure:

Failure 1 — Lane not pinged (L13076):

root.pingedLanes |= root.suspendedLanes & pingedLanes;
//                   ^^^^^^^^^^^^^^^^^ = 0, so pingedLanes stays 0

Failure 2 — No recovery path (L13078-13088):

workInProgressRoot === root                           // TRUE
  && (workInProgressRootRenderLanes & pingedLanes)    // TRUE
  && (4 === workInProgressRootExitStatus              // TRUE (RootSuspended)
      ? 0 === (executionContext & 2)                  // FALSE! In render context.
        && prepareFreshStack(root, 0)                 // NOT CALLED
      : (workInProgressRootPingedLanes |= pingedLanes) // NOT REACHED (wrong ternary branch)
     )

The ternary evaluates 4 === workInProgressRootExitStatus as true, taking the true-branch where executionContext & 2 blocks prepareFreshStack. The else-branch (workInProgressRootPingedLanes |= pingedLanes) is never reached. The lane is lost with no recovery path.

How the Synchronous Ping Happens

The key trigger is the RSC (React Server Components) client's ReactPromise.prototype.then implementation (react-server-dom-turbopack-client.browser.production.js:539-563):

// L542-549
if (this.status === "resolved_model") initializeModelChunk(this);
if (this.status === "fulfilled") {
  if (typeof resolve === "function") resolve(this.value); // SYNCHRONOUS!
}

When attachPingListener calls .then(pingSuspendedRoot) on an RSC chunk that is already fulfilled (or transitions to fulfilled during initializeModelChunk), the ping callback fires synchronously, while still inside throwAndUnwindWorkLoopthrowException, which is inside renderRootConcurrent with executionContext = 2.

Detailed Timeline (from prod recording c0b7c6fb)

Time Event Key State
5147ms User types "d" (final keystroke) Sync render, lanes=2
5150ms Commit #37 (last ever) markRootFinished clears suspendedLanes=0, pingedLanes=0
5150ms 1st deferred render starts use(promise) → pending → SuspenseException
5151ms Render yields after suspension markRootSuspended sets suspendedLanes=524800
5158ms Pings 1 & 2 (fetch response) executionContext=0, suspendedLanes=524800pings work correctly
5159ms 2nd deferred render starts prepareFreshStack, promise now fulfilled, render proceeds
5163ms RSC lazy chunk encountered Pending → SuspenseException → yield
5169ms Re-entry: chunk not resolved throwAndUnwindWorkLoopthrowExceptionrenderDidSuspendDelayIfPossible sets exitStatus=4
5169ms attachPingListener.then() RSC chunk .then() fires synchronously (chunk already fulfilled)
5169ms pingSuspendedRoot (THE BUG) executionContext=2, suspendedLanes=0, pingedLanes=0lane 524288 LOST
5191ms Render completes markRootSuspended sets suspendedLanes=524288 but pingedLanes=0 — permanently stuck

Why Dev Mode Works

In development builds, the RSC client likely doesn't trigger synchronous resolution in .then(), or React's dev-mode Suspense handling processes thenables differently. The ping always fires asynchronously (when executionContext = 0), so root.suspendedLanes is properly populated by the time the ping runs.

Evidence: Dev recording shows 125 render commits (every keystroke gets both a SetState and SuspenseResumed). Prod shows only 38 commits, with the last being a SetState with no follow-up SuspenseResumed.

3. Suggested Fix

There are two complementary fixes, either of which would prevent this class of bug:

Fix A: Handle synchronous pings during render in pingSuspendedRoot

The ternary at L13078-13088 has a logic gap. When exitStatus === 4 and executionContext & 2, the code should still update workInProgressRootPingedLanes so the render can retry the pinged lanes when it completes:

// In pingSuspendedRoot, after L13076:
if (workInProgressRoot === root) {
  if ((workInProgressRootRenderLanes & pingedLanes) === pingedLanes) {
    if (4 === workInProgressRootExitStatus || ...) {
      if (0 === (executionContext & 2)) {
        prepareFreshStack(root, 0);
      } else {
        // NEW: Even if we can't restart now, mark lanes as pinged
        // so the scheduler picks them up after render completes
        workInProgressRootPingedLanes |= pingedLanes;
      }
    } else {
      workInProgressRootPingedLanes |= pingedLanes;
    }
  }
}

Fix B: Prevent synchronous ping resolution from RSC chunks

In attachPingListener or in the RSC client's ReactPromise.prototype.then, ensure the ping callback is always invoked asynchronously (e.g., via microtask), so it never fires inside the render context:

// In attachPingListener, wrap the ping:
wakeable.then(
  () => queueMicrotask(() => pingSuspendedRoot(root, pingedLanes)),
  () => queueMicrotask(() => pingSuspendedRoot(root, pingedLanes))
);

Fix C: Also update root.pingedLanes from workInProgressRootPingedLanes after render

In performWorkOnRoot, after renderRootConcurrent returns with exitStatus = 4 (RootSuspended), propagate any in-render pinged lanes to the root before calling markRootSuspended:

// Before markRootSuspended in the exitStatus=4 path:
root.pingedLanes |= workInProgressRootPingedLanes;

Recommended approach: Fix A is the most targeted — it addresses the exact ternary logic gap without changing ping timing semantics. Fix B is a defense-in-depth measure. Fix C ensures the information isn't lost even if other paths hit the same issue.

4. Key Evidence Trail

Step Tool Finding
1 RecordingOverview (both recordings) Dev: 125 commits, Prod: 38 commits. Same React 19.3.0-canary.
2 ReactRenders commits (prod) Last commit #37 is SetState (2 fibers), no SuspenseResumed follows. Recording continues 5+ seconds.
3 NetworkRequest detail (prod) All 21 POSTs return 200 OK. Final POST body "hello world from prod" returns correct data at 5158ms.
4 ReactRenders trigger-detail Commit #37 triggered by keypress "d". Previous commit #36 was SuspenseResumed from network.
5 SearchSources / ReadSource Found updateDeferredValueImpl (L5407-5424): sync renders defer, deferred renders accept.
6 Logpoint on updateDeferredValueImpl Last 8 hits show hit #6 (5150ms) and #8 (5159ms) accepting deferred value — but no render commits follow.
7 DescribePoint at hit #8 (Point:162) Promise is fulfilled with correct "hello world from prod" ReactNode.
8 SearchSources for pingSuspendedRoot Found L13073-13089: lane pinged via `root.pingedLanes
9 Logpoint on pingSuspendedRoot L13076 3 pings after final keystroke. Pings 1-2 at 5158ms work (suspendedLanes=524800). Ping 3 at 5169ms fails (suspendedLanes=0, executionContext=2).
10 Logpoint on markRootSuspended L12149 4 hits. Hit 4 at 5191ms: suspendedLanes goes 0→524288 with pingedLanes=0. Permanent stuck state.
11 ReadSource + DescribePoint on throwException L6205-6218 renderDidSuspendDelayIfPossible() called BEFORE attachPingListener() — sets exitStatus=4 before ping can fire.
12 GetStack at Point:182 (broken ping) Call stack proves synchronous invocation: pingSuspendedRoot ← RSC .then()attachPingListenerthrowExceptionthrowAndUnwindWorkLooprenderRootConcurrent.
13 ReadSource RSC client L539-563 ReactPromise.prototype.then: if status is "fulfilled", calls resolve(this.value) synchronously.
14 ReadSource performWorkOnRoot L11847-11851 exitStatus=0 (yielded): no markRootSuspended called, so root.suspendedLanes stays 0 during re-entry render.
15 Evaluate at Point:182 Confirmed: root.suspendedLanes=0, root.pingedLanes=0, executionContext=2, workInProgressRootExitStatus=4. All conditions for the double failure met.

Summary

This is a React 19 scheduler bug where a synchronous ping from an RSC lazy chunk fires during an active concurrent render, hitting a logic gap in pingSuspendedRoot's ternary condition. The deferred lane is neither pinged on the root nor recorded in workInProgressRootPingedLanes, causing it to be permanently suspended with no recovery mechanism. The bug is prod-only because the RSC client in production resolves chunks synchronously in .then(), while the dev bundle resolves them asynchronously.

Agent #2 (Investigation Methodology): 17 Minutes

Agent #2 also started with RecordingOverview and quickly identified the commit mismatch. It then used Replay MCP’s React render trigger details to trace the causation chain from keystrokes to render commits.

It made its way to pingSuspendedRoot and used the Logpoint tool to check the hits and values inside. It got briefly sidetracked on scheduling behavior before coming back to the rendering logic.

Along the way, it actually identified a second potential bug that none of the other runs found. It found a isThenableResolved(thenable) call that checks to see if a promise is resolved or not, and identified that RSC promises use a different status value, "resolved_model" instead of "fulfilled". It suggested a tweak to this logic to allow resuming synchronously instead of re-throwing.

In the end it also suggested the common “re-ping lanes” fix that the other agents and the PR had, but this was a fascinating insight, and derived solely from inspecting the runtime behavior in this one recording.

Agent #2 prompt

You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.

Bug Report

App: Next.js app using React 19 with server actions. Source at ~/projects/temp/react-udv-bug.

Behavior:

  • A textarea input feeds text to a server action (renderAction) which returns Promise<ReactNode>
  • The promise is wrapped in useDeferredValue, then consumed via use() inside <Suspense>
  • In dev mode (next dev): the deferred output box tracks the input and catches up after typing stops
  • In prod mode (next build && next start): the deferred output frequently gets stuck and never catches up to the final input value

Recordings:

  • Dev (working): 06447062-f403-49f3-8bd8-476a65b448d2
  • Prod (broken): c0b7c6fb-8f2a-461c-9fca-0ba2e9be3083

In the broken prod recording, the user typed "hello world from prod" but the deferred output is stuck showing "hello world from pro" — one character behind, permanently.

What we know:

  • No console errors in either recording
  • The server action POST requests all return 200 OK with valid data
  • The app code is identical between dev and prod — only the build mode differs
  • It only manifests in production builds

Goal: Figure out why the deferred value stops updating in production and identify the root cause.

Investigation Methodology

Follow this structured process:

Step 1: Establish the baseline

Get an overview of both recordings. Compare high-level stats: render commit counts, network request counts, timing. The quantitative difference between dev and prod is your first clue.

Step 2: Confirm the visual bug

Take screenshots at the end of both recordings to verify what each shows. Confirm the prod recording is stuck one character behind.

Step 3: Compare render commit patterns

Look at the sequence of React render commits in both recordings. Focus on the tail end — the last few commits. In a working recording, you'd expect the render pattern to complete cleanly. In the broken one, something is missing or different. What's the last commit in each? What triggered it? What components re-rendered?

Step 4: Map the data flow end-to-end

Trace the full lifecycle: keystroke → server action POST → response → promise resolution → deferred value update → component re-render. Verify each step completes in both recordings. If all steps complete but the render doesn't happen, the bug is in React's scheduling, not the data flow.

Step 5: Find the divergence point

The app code is identical. The data flow completes in both. So what's different at the React runtime level? Read the relevant React source code in the recorded bundles. Focus on understanding the algorithm before looking for bugs — understand what SHOULD happen first, then look for where reality diverges.

Step 6: Trace state through the critical path

Once you've identified the relevant scheduling/reconciler functions, trace mutable state values through the critical time window. Focus on the last update that fails in prod — what state does React see? Compare the same state in the dev recording for the equivalent update. The divergence in state values IS the root cause.

Step 7: Explain why dev vs prod

Dev and prod React builds differ: dev is slower (extra validation, warnings, double-renders in StrictMode), prod is faster (optimized, no dev checks). This timing difference can change when asynchronous callbacks (microtasks, promise .then()) fire relative to React's render cycle. If a callback fires at a different point in the render lifecycle between dev and prod, that's likely the mechanism.

Step 8: Red-team your conclusion

Before finalizing: Does your explanation account for WHY it's prod-only? Does your fix address the root cause or just a symptom? Could there be other scenarios that trigger the same bug?

Deliverable

Write a detailed bug report with:

  1. Bug Description — What happens
  2. Root Cause Analysis — What exactly causes the deferred value to get stuck in prod but not dev
  3. Suggested Fix — What code change would fix this
  4. Key Evidence Trail — Table of tools used and findings at each step
Agent #2 final analysis

Bug Report: useDeferredValue Gets Permanently Stuck in Production with RSC Server Actions

1. Bug Description

When using useDeferredValue with a promise returned by a React Server Action inside <Suspense>, the deferred output permanently stops updating in production builds. The last update is lost — the output gets stuck one (or more) characters behind the input and never catches up, even though the server action completes successfully and returns correct data.

Reproduction: Type rapidly into a textarea that triggers a server action on each keystroke. The server action returns Promise<ReactNode>, which is passed through useDeferredValue and consumed via use() inside <Suspense>. In dev mode, output catches up after typing stops. In prod mode, output freezes permanently behind.

2. Root Cause Analysis

This is a race condition between React's Suspense ping mechanism and RSC chunk resolution, exposed only in production due to faster React execution.

The Two Interacting Bugs

Bug A: isThenableResolved doesn't recognize RSC intermediate states

React's isThenableResolved function (react-dom-client.production.js:3394) only checks for status === "fulfilled" or status === "rejected". But RSC flight client chunks use an intermediate "resolved_model" status, meaning HTTP data has arrived but hasn't been fully deserialized yet. When React's renderRootConcurrent resumes a suspended render (case 7: SuspendedOnData) and checks isThenableResolved(thrownValue), it returns false for "resolved_model" chunks and falls through to throwAndUnwindWorkLoop — treating already-available data as still-pending.

Bug B: Synchronous ping fires before lane is marked suspended, creating a dead lane

After throwAndUnwindWorkLoop, React calls attachPingListener which attaches .then(pingSuspendedRoot) to the RSC chunk. But RSC's ReactPromise.prototype.then (line 539) detects the "resolved_model" status, synchronously calls initializeModelChunk() (transitioning to "fulfilled"), and synchronously invokes pingSuspendedRoot — all within the same call stack as attachPingListener.

At this moment, the render is still in progress. root.suspendedLanes is 0 because markRootSuspended hasn't run yet. So pingSuspendedRoot computes:

root.pingedLanes |= root.suspendedLanes & pingedLanes
//                    ^^^^^^^^^^^^^^^^
//                    = 0 (not yet set!)
// Result: pingedLanes stays 0. Ping is wasted.

Later (~26ms), the render completes with exit status 4 (RootSuspended). markRootSuspended finally sets root.suspendedLanes |= 524288. But the ping already fired and won't fire again. The lane is now:

  • pendingLanes: has bit 19 ✓
  • suspendedLanes: has bit 19 ✓
  • pingedLanes: 0 ✗

getNextLanes computes nonIdlePendingLanes & ~suspendedLanes = 0. The lane is permanently stuck — pending but unreachable.

The Fatal Timeline (from prod recording)

Time (ms) Event State
5147 Final keystroke "d" setPromise(renderAction("hello world from prod"))
5149 POST request sent Server action in flight
5158 POST response arrives RSC chunk status → "resolved_model"
5165 renderRootConcurrent case 7 isThenableResolved("resolved_model")false
5165 throwAndUnwindWorkLoop Suspense boundary catches, render continues unwinding
5168 attachPingListener .then(pingSuspendedRoot) attached to chunk
5169 RSC .then() fires synchronously initializeModelChunk"fulfilled"pingSuspendedRoot
5169 Wasted ping suspendedLanes=0, so pingedLanes stays 0
5191 Render completes Exit status 4 (RootSuspended)
5191 markRootSuspended suspendedLanes = 524288 (too late for ping)
5192 scheduleTaskForRootDuringMicrotask getNextLanes → 0. Dead lane. No more renders.

Why Dev Mode Doesn't Hit This

Dev React has extra overhead: StrictMode double-renders, validation checks, debug IO tracking in trackUsedThenable (lines 3838-3854). This extra time creates a window where the RSC chunk transitions from "resolved_model""fulfilled" before React's case 7 check runs. So isThenableResolved returns true, replaySuspendedUnitOfWork is called instead of throwAndUnwindWorkLoop, and the value is consumed directly — no ping needed.

3. Suggested Fix

There are two complementary fixes, either of which would prevent the dead lane:

Fix A (React reconciler): Recognize RSC intermediate resolved states

In isThenableResolved, also check for "resolved_model" (and potentially other RSC intermediate resolved states like "resolved_module"):

// react-reconciler: ReactFiberThenable.js
function isThenableResolved(thenable) {
  var status = thenable.status;
  return status === "fulfilled" || status === "rejected"
    || status === "resolved_model" || status === "resolved_module";
}

This way, when case 7 checks the thenable, it recognizes the data has arrived and calls replaySuspendedUnitOfWork instead of throwAndUnwindWorkLoop. The render succeeds without needing a ping at all.

Fix B (React reconciler): Guard against synchronous pings during render

In pingSuspendedRoot, if the lane isn't yet in suspendedLanes (because render is still in progress), defer the ping or ensure the lane gets re-pinged after markRootSuspended:

// react-reconciler: ReactFiberWorkLoop.js
function markRootSuspended(root, suspendedLanes, ...) {
  root.suspendedLanes |= suspendedLanes;
  root.pingedLanes &= ~suspendedLanes;
  // NEW: Check if any of these lanes were pinged while render was in progress
  // If so, immediately re-ping them
  if (root._pendingPingsDuringRender & suspendedLanes) {
    root.pingedLanes |= root._pendingPingsDuringRender & suspendedLanes;
    root._pendingPingsDuringRender = 0;
  }
}

Or, alternatively in pingSuspendedRoot, detect that a render is in progress and record the ping for later application.

Fix A is the cleaner solution — it avoids the ping mechanism entirely for already-resolved data and handles the root cause (misidentification of RSC chunk state). Fix B is a defense-in-depth measure for any other scenario where synchronous pings could race with in-progress renders.

4. Key Evidence Trail

Step Tool Finding
Baseline comparison RecordingOverview (both recordings) Dev: 125 commits, 318ms render. Prod: 38 commits, 107ms render. 3x render difference.
Visual confirmation Screenshot (both, final timestamp) Dev output matches input. Prod stuck at "hello world from pro" — one char behind.
Last render commits ReactRenders commits mode (prod) Last commit #37 at 5.15s is a setState for the typing event, not a Suspense resolution. No commit for final value.
Network verification NetworkRequest detail (prod, last 2 POSTs) Both return 200 OK. Final POST contains "hello world from prod" — correct data.
Trigger chains ReactRenders trigger-detail (prod, commits 35-37) Commit 36 triggered by network response for "hello world from pro". No commit triggered by final response.
Ping analysis Logpoint on pingSuspendedRoot 3 pings after final keystroke. Ping #3 (Point:114, 5169ms): suspendedLanes=0, ping is a no-op.
Re-suspension Logpoint on markRootSuspended Point:118 (5191ms): Lane 524288 re-suspended. pingedLanes already 0.
Scheduling dead-end Logpoint on scheduleTaskForRootDuringMicrotask After re-suspension: getNextLanes returns 0. Lane stuck forever.
isThenableResolved Logpoint on line 3394 + DescribePoint Returns false for status === "resolved_model". Only checks "fulfilled"/"rejected".
Case 7 path DescribePoint on renderRootConcurrent case 7 isThenableResolved false → throwAndUnwindWorkLoop instead of replaySuspendedUnitOfWork.
RSC chunk .then() ReadSource on RSC client code ReactPromise.prototype.then synchronously resolves "resolved_model""fulfilled" and invokes callback inline.
Dev vs prod timing DescribePoint comparison of trackUsedThenable Dev has extra debug IO code (lines 3838-3854) creating enough delay for chunk to reach "fulfilled" before case 7 check.

5. Impact Assessment

  • Severity: High — causes permanent UI staleness with no recovery
  • Scope: Any Next.js app using useDeferredValue + use() + RSC server actions with rapid updates
  • Workarounds: None clean. Adding artificial delays would mask but not fix. Avoiding useDeferredValue with server actions works but defeats the purpose.
  • React version affected: 19.3.0-canary-f93b9fd4-20251217 (and likely all React 19 builds with RSC support)

Agent #3 (React Scheduling Concepts): 8 Minutes

Agent #3 started with RecordingOverview to check the output, and Screenshot to confirm the problem visually. Since it already had the terminology to describe React’s scheduling internals, it made a beeline to functions like markRootSuspended, even without any actual knowledge of the implementation.

It used the Logpoint tool to check hits for pingSuspendedRoot and markRootSuspended, and quickly identified the relevant ternary logic as the culprit. It then checked vs the dev recording line hits, and came up with the pinged lanes solution, without any major rabbit trails.

Agent #3 prompt

You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.

Bug Report

App: Next.js app using React 19 with server actions. Source at ~/projects/temp/react-udv-bug.

Behavior:

  • A textarea input feeds text to a server action (renderAction) which returns Promise<ReactNode>
  • The promise is wrapped in useDeferredValue, then consumed via use() inside <Suspense>
  • In dev mode (next dev): the deferred output box tracks the input and catches up after typing stops
  • In prod mode (next build && next start): the deferred output frequently gets stuck and never catches up to the final input value

Recordings:

  • Dev (working): 06447062-f403-49f3-8bd8-476a65b448d2
  • Prod (broken): c0b7c6fb-8f2a-461c-9fca-0ba2e9be3083

In the broken prod recording, the user typed "hello world from prod" but the deferred output is stuck showing "hello world from pro" — one character behind, permanently.

What we know:

  • No console errors in either recording
  • The server action POST requests all return 200 OK with valid data
  • The app code is identical between dev and prod — only the build mode differs
  • It only manifests in production builds

Goal: Figure out why the deferred value stops updating in production and identify the root cause.

Investigation Methodology

Follow this structured process:

Step 1: Establish the baseline

Get an overview of both recordings. Compare high-level stats: render commit counts, network request counts, timing. The quantitative difference between dev and prod is your first clue.

Step 2: Confirm the visual bug

Take screenshots at the end of both recordings to verify what each shows. Confirm the prod recording is stuck one character behind.

Step 3: Compare render commit patterns

Look at the sequence of React render commits in both recordings. Focus on the tail end — the last few commits. In a working recording, you'd expect the render pattern to complete cleanly. In the broken one, something is missing or different. What's the last commit in each? What triggered it? What components re-rendered?

Step 4: Map the data flow end-to-end

Trace the full lifecycle: keystroke → server action POST → response → promise resolution → deferred value update → component re-render. Verify each step completes in both recordings. If all steps complete but the render doesn't happen, the bug is in React's scheduling, not the data flow.

Step 5: Find the divergence point

The app code is identical. The data flow completes in both. So what's different at the React runtime level? Read the relevant React source code in the recorded bundles. Focus on understanding the algorithm before looking for bugs — understand what SHOULD happen first, then look for where reality diverges.

Step 6: Trace state through the critical path

Once you've identified the relevant scheduling/reconciler functions, trace mutable state values through the critical time window. Focus on the last update that fails in prod — what state does React see? Compare the same state in the dev recording for the equivalent update. The divergence in state values IS the root cause.

Step 7: Explain why dev vs prod

Dev and prod React builds differ: dev is slower (extra validation, warnings, double-renders in StrictMode), prod is faster (optimized, no dev checks). This timing difference can change when asynchronous callbacks (microtasks, promise .then()) fire relative to React's render cycle. If a callback fires at a different point in the render lifecycle between dev and prod, that's likely the mechanism.

Step 8: Red-team your conclusion

Before finalizing: Does your explanation account for WHY it's prod-only? Does your fix address the root cause or just a symptom? Could there be other scenarios that trigger the same bug?

Conceptual Context: React's Scheduling System

These concepts will help you navigate the React internals you'll encounter:

  • Lanes: React uses a bitmask-based priority system called "lanes." Each update is assigned a lane (e.g., SyncLane, TransitionLane, DeferredLane). useDeferredValue schedules work on deferred lanes, which are lower priority than user-initiated updates.
  • Suspension and pinging: When a component throws a Promise (Suspense), React marks the update's lanes as "suspended." When the promise resolves, a "ping" fires that marks those lanes as "pinged," telling the scheduler to retry. The lifecycle is: pending → rendering → suspended → pinged → retry rendering → committed.
  • workInProgress state: During rendering, React tracks the current render's state in module-scoped variables prefixed with workInProgress — things like what root is being rendered, what lanes, what the exit status is, which lanes have been pinged during this render. These are mutable and change throughout the render cycle.
  • markRootSuspended: When a render suspends, this function marks the lanes as suspended on the root. It uses workInProgressRootPingedLanes to exclude lanes that were pinged during the current render (so they don't get stuck as suspended). It also adds lanes to warmLanes which affects future scheduling.
  • Concurrent rendering and yielding: In prod, React uses time-slicing — renders can yield back to the browser mid-work and resume later. This means microtasks (like promise .then() callbacks) can fire between render chunks. In dev mode, renders tend to run longer without yielding due to extra dev checks, so microtasks fire after the render completes.
  • The key timing question: When a suspended promise resolves, the .then() ping callback might fire during an active render (prod, due to yielding) or after the render completes (dev, due to slower execution). If the ping handling code doesn't correctly handle the "ping arrives mid-render" case, the lane can get stuck.

Deliverable

Write a detailed bug report with:

  1. Bug Description — What happens
  2. Root Cause Analysis — What exactly causes the deferred value to get stuck in prod but not dev
  3. Suggested Fix — What code change would fix this
  4. Key Evidence Trail — Table of tools used and findings at each step
Agent #3 final analysis

Bug Report: useDeferredValue Permanently Stalls in Production React Builds

1. Bug Description

When using useDeferredValue with a server action that returns Promise<ReactNode>, wrapped in <Suspense> and consumed via use(), the deferred output frequently stops updating in production builds. The user types "hello world from prod" but the deferred display is stuck showing "hello world from pro" — one character behind, permanently. The same app code works correctly in dev mode.

2. Root Cause Analysis

The bug is in React's pingSuspendedRoot function — specifically, a code path that silently drops a Suspense ping when it arrives during an active concurrent render with RootSuspendedWithDelay exit status.

The mechanism step by step:

1. User types the final 'd' in "prod" (5149ms). React:

  • Schedules a new server action POST for "hello world from prod"
  • Begins a concurrent render on lane 524288 (DeferredLane)

2. The server action responds quickly (5155ms, ~6ms round-trip). The promise's .then() callback fires pingSuspendedRoot at 5169ms — while React is still mid-render of the same lane due to concurrent time-slicing.

3. Inside pingSuspendedRoot (line 13073), the ping enters the critical branch:

function pingSuspendedRoot(root, wakeable, pingedLanes) {
  // Line 13076: Try to mark root.pingedLanes
  root.pingedLanes |= root.suspendedLanes & pingedLanes;
  //                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  //  BUG: root.suspendedLanes is STILL 0 here!
  //  The lane hasn't been marked suspended yet (render is still in progress).
  //  So this is: 0 |= (0 & 524288) = 0. NO EFFECT.

  root.warmLanes &= ~pingedLanes;  // warmLanes also 0 at this point

  // Line 13078: Are we currently rendering this root+lanes?
  workInProgressRoot === root &&           // TRUE (same root)
  (workInProgressRootRenderLanes & pingedLanes) === pingedLanes &&  // TRUE (same lane)

  // Line 13080: Ternary — exitStatus check
  (4 === workInProgressRootExitStatus ||   // TRUE (RootSuspendedWithDelay = 4)
   (...))
    // FIRST BRANCH taken:
    ? 0 === (executionContext & 2) && prepareFreshStack(root, 0)
    //       ^^^^^^^^^^^^^^^^^^^^^^
    //  BUG: executionContext IS 2 (RenderContext)! We're mid-render!
    //  So 0 === (2 & 2) is FALSE. prepareFreshStack is NOT called.
    //  This entire branch is a no-op.

    // ELSE BRANCH (never reached because ternary was truthy):
    : (workInProgressRootPingedLanes |= pingedLanes)
    //  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    //  This would have saved us! But it's unreachable.

  ensureRootIsScheduled(root);  // Called, but getNextLanes returns 0
}

4. The render completes and calls markRootSuspended (line 12141):

function markRootSuspended(root, suspendedLanes, spawnedLane, didAttemptEntireTree) {
  suspendedLanes &= ~workInProgressRootPingedLanes;  // 524288 &= ~0 = 524288 (no exclusion)
  root.suspendedLanes |= suspendedLanes;              // Lane 524288 marked SUSPENDED
  root.pingedLanes &= ~suspendedLanes;                // Clears any pings (already 0)
  didAttemptEntireTree && (root.warmLanes |= suspendedLanes);  // Lane marked WARM
}

5. The lane is now permanently stuck:

  • root.pendingLanes = 524288 ✓ (work pending)
  • root.suspendedLanes = 524288 ✗ (excluded from scheduling)
  • root.pingedLanes = 0 ✗ (no ping recorded to rescue it)
  • root.warmLanes = 524288 ✗ (excluded from cold-retry path)

6. getNextLanes (line 706) returns 0 — all three rescue paths fail:

  1. pendingLanes & ~suspendedLanes = 0 (suspended)
  2. pingedLanes & nonIdlePendingLanes = 0 (not pinged)
  3. nonIdlePendingLanes & ~warmLanes = 0 (warm)

The scheduler permanently sees no work. The UI is stuck.

Why dev mode works

In development mode, React runs significantly slower due to:

  • Extra validation checks, warning infrastructure, DevTools hooks
  • StrictMode double-rendering

This means renders take longer and never yield to the browser mid-render. Promise .then() microtask callbacks therefore always fire after the render completes. In every single ping across 126 hits in the dev recording, workInProgressRoot !== root and executionContext === 0 — the ping always takes the fast path of just setting root.pingedLanes (line 13076, where root.suspendedLanes is already set) and calling ensureRootIsScheduled. The buggy ternary at line 13078 is never entered.

In production, concurrent time-slicing allows yields mid-render, so microtasks from resolved promises can fire during the render. This creates the timing window where the ping arrives before markRootSuspended runs.

3. Suggested Fix

The fix is in pingSuspendedRoot. When the ping fires mid-render (executionContext has RenderContext bit set) and the exit status prevents prepareFreshStack, the function must still record the ping in workInProgressRootPingedLanes so that markRootSuspended can exclude the lane from suspension.

Option A (minimal fix): When the prepareFreshStack guard fails, fall through to set workInProgressRootPingedLanes:

// In pingSuspendedRoot, replace the ternary at lines 13080-13086:

// BEFORE (buggy):
(4 === workInProgressRootExitStatus || (...))
  ? 0 === (executionContext & 2) && prepareFreshStack(root, 0)
  : (workInProgressRootPingedLanes |= pingedLanes)

// AFTER (fixed):
if (4 === workInProgressRootExitStatus || (...)) {
  if (0 === (executionContext & 2)) {
    prepareFreshStack(root, 0);
  } else {
    // Ping arrived mid-render. Can't restart now, but record the ping
    // so markRootSuspended won't mark this lane as suspended.
    workInProgressRootPingedLanes |= pingedLanes;
  }
} else {
  workInProgressRootPingedLanes |= pingedLanes;
}

Option B (defensive): Additionally ensure root.pingedLanes is set even when the lane isn't yet suspended, so the ping survives regardless of ordering:

// At the top of pingSuspendedRoot, unconditionally record the ping on the root
// for lanes that are pending (not just suspended):
root.pingedLanes |= root.suspendedLanes & pingedLanes;
root.pingedLanes |= root.pendingLanes & pingedLanes;  // NEW: catch pre-suspension pings

Option A is the more surgical fix addressing the exact logic error. Option B is belt-and-suspenders.

4. Key Evidence Trail

Step Tool Finding
Recording overview RecordingOverview (both) Dev: 125 commits, 63 POSTs. Prod: 38 commits, 21 POSTs. Massive gap.
Visual confirmation Screenshot (both at final timestamps) Dev shows "hello world from prod" in output. Prod shows "hello world from pro" — stuck.
Render commit tails ReactRenders commit mode (both) Dev ends with SuspenseResumed at 8.47s. Prod ends with SetState at 5.15s — no final SuspenseResumed.
Network completion NetworkRequest detail (prod #36) Final POST "hello world from prod" returns 200 OK at 5155ms with valid RSC payload. Data arrived.
App source code grepika_get on app/client.tsx useDeferredValue(promise)use() in <Suspense>. Standard pattern.
pingSuspendedRoot analysis ReadSource + Logpoint on line 13073 46 total hits. Hit #5 (Point:114, 5169ms) fires mid-render: executionContext=2, root.suspendedLanes=0, workInProgressRootExitStatus=4. Ping silently dropped.
markRootSuspended aftermath Logpoint on line 12147 + post-state eval After line 12161: root.suspendedLanes=524288, root.pingedLanes=0, root.warmLanes=524288. Lane permanently stuck.
getNextLanes deadlock ReadSource on line 706 Three rescue paths all return 0 with stuck state. No work scheduled.
Dev comparison Logpoint on line 13073 (dev recording) All 126 pings: executionContext=0, workInProgressRoot !== root. Buggy branch never entered.
markRootSuspended exclusion logic ReadSource on line 12147 suspendedLanes &= ~workInProgressRootPingedLanes — would exclude pinged lanes, but workInProgressRootPingedLanes=0 because the ping was dropped.

5. Summary

This is a race condition in React's concurrent rendering scheduler. When a Suspense promise resolves during an active concurrent render (via microtask firing during a yield), pingSuspendedRoot enters a code path where:

  1. root.pingedLanes can't be set (lane not yet suspended on root)
  2. prepareFreshStack can't be called (currently in render context)
  3. workInProgressRootPingedLanes isn't set (unreachable else-branch of ternary)

The ping is completely lost. The subsequent markRootSuspended marks the lane as suspended with no ping, and getNextLanes can never find work for it again. The fix is to ensure workInProgressRootPingedLanes is set in the mid-render case, so markRootSuspended excludes the lane from suspension.

Agent #4 (React Concepts + Replay Tools Overview): 7 Minutes

Agent #4 started off the same way as #3, with RecordingOverview and Screenshot tool calls. After scanning the source for pingSuspendedRoot in the React bundle, it tried to use Logpoint calls but struggled a bit with the syntax, so it switched to Evaluate instead to similarly retrieve real values in scope at various execution points.

It found the relevant ternary logic in 4 minutes, confirmed the issue and compared with the dev build, did some red-team review, and wrote up the final report with the correct pinged lanes solution.

This run was the most efficient - it read the source, knew where to instrument, and produced the best final report .

Agent #4 prompt

You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.

Bug Report

App: Next.js app using React 19 with server actions. Source at ~/projects/temp/react-udv-bug.

Behavior:

  • A textarea input feeds text to a server action (renderAction) which returns Promise<ReactNode>
  • The promise is wrapped in useDeferredValue, then consumed via use() inside <Suspense>
  • In dev mode (next dev): the deferred output box tracks the input and catches up after typing stops
  • In prod mode (next build && next start): the deferred output frequently gets stuck and never catches up to the final input value

Recordings:

  • Dev (working): 06447062-f403-49f3-8bd8-476a65b448d2
  • Prod (broken): c0b7c6fb-8f2a-461c-9fca-0ba2e9be3083

In the broken prod recording, the user typed "hello world from prod" but the deferred output is stuck showing "hello world from pro" — one character behind, permanently.

What we know:

  • No console errors in either recording
  • The server action POST requests all return 200 OK with valid data
  • The app code is identical between dev and prod — only the build mode differs
  • It only manifests in production builds

Goal: Figure out why the deferred value stops updating in production and identify the root cause.

Investigation Methodology

Follow this structured process:

Step 1: Establish the baseline

Get an overview of both recordings. Compare high-level stats: render commit counts, network request counts, timing. The quantitative difference between dev and prod is your first clue.

Step 2: Confirm the visual bug

Take screenshots at the end of both recordings to verify what each shows. Confirm the prod recording is stuck one character behind.

Step 3: Compare render commit patterns

Look at the sequence of React render commits in both recordings. Focus on the tail end — the last few commits. In a working recording, you'd expect the render pattern to complete cleanly. In the broken one, something is missing or different. What's the last commit in each? What triggered it? What components re-rendered?

Step 4: Map the data flow end-to-end

Trace the full lifecycle: keystroke → server action POST → response → promise resolution → deferred value update → component re-render. Verify each step completes in both recordings. If all steps complete but the render doesn't happen, the bug is in React's scheduling, not the data flow.

Step 5: Find the divergence point

The app code is identical. The data flow completes in both. So what's different at the React runtime level? Read the relevant React source code in the recorded bundles. Focus on understanding the algorithm before looking for bugs — understand what SHOULD happen first, then look for where reality diverges.

Step 6: Trace state through the critical path

Once you've identified the relevant scheduling/reconciler functions, trace mutable state values through the critical time window. Focus on the last update that fails in prod — what state does React see? Compare the same state in the dev recording for the equivalent update. The divergence in state values IS the root cause.

Step 7: Explain why dev vs prod

Dev and prod React builds differ: dev is slower (extra validation, warnings, double-renders in StrictMode), prod is faster (optimized, no dev checks). This timing difference can change when asynchronous callbacks (microtasks, promise .then()) fire relative to React's render cycle. If a callback fires at a different point in the render lifecycle between dev and prod, that's likely the mechanism.

Step 8: Red-team your conclusion

Before finalizing: Does your explanation account for WHY it's prod-only? Does your fix address the root cause or just a symptom? Could there be other scenarios that trigger the same bug?

Conceptual Context: React's Scheduling System

These concepts will help you navigate the React internals you'll encounter:

  • Lanes: React uses a bitmask-based priority system called "lanes." Each update is assigned a lane (e.g., SyncLane, TransitionLane, DeferredLane). useDeferredValue schedules work on deferred lanes, which are lower priority than user-initiated updates.
  • Suspension and pinging: When a component throws a Promise (Suspense), React marks the update's lanes as "suspended." When the promise resolves, a "ping" fires that marks those lanes as "pinged," telling the scheduler to retry. The lifecycle is: pending → rendering → suspended → pinged → retry rendering → committed.
  • workInProgress state: During rendering, React tracks the current render's state in module-scoped variables prefixed with workInProgress — things like what root is being rendered, what lanes, what the exit status is, which lanes have been pinged during this render. These are mutable and change throughout the render cycle.
  • markRootSuspended: When a render suspends, this function marks the lanes as suspended on the root. It uses workInProgressRootPingedLanes to exclude lanes that were pinged during the current render (so they don't get stuck as suspended). It also adds lanes to warmLanes which affects future scheduling.
  • Concurrent rendering and yielding: In prod, React uses time-slicing — renders can yield back to the browser mid-work and resume later. This means microtasks (like promise .then() callbacks) can fire between render chunks. In dev mode, renders tend to run longer without yielding due to extra dev checks, so microtasks fire after the render completes.
  • The key timing question: When a suspended promise resolves, the .then() ping callback might fire during an active render (prod, due to yielding) or after the render completes (dev, due to slower execution). If the ping handling code doesn't correctly handle the "ping arrives mid-render" case, the lane can get stuck.

Replay MCP Tool Strategy

You have access to powerful runtime inspection tools. Here's how to use them effectively for this investigation:

  • RecordingOverview — Start here for each recording. Gets you error counts, render commit stats, network summary, environment info. Run on BOTH recordings first.
  • ReactRenders — Your primary comparison tool. Use mode: commits to see the full commit sequence. Use mode: commit with commitIndex to drill into individual commits. Focus on the last few commits in each recording — where does prod diverge from dev?
  • NetworkRequest — Verify all server action POSTs returned successfully. Use to rule out data-level issues early. Check timing of responses relative to render commits.
  • SearchSources — Find functions in the recorded bundles by name. React 19 ships unminified production bundles, so function names are preserved. Search for scheduling and reconciler functions to find their locations.
  • ReadSource — Read source code with execution hit counts per line. This is a built-in static profiler — you can see which branches executed and how many times. Use this to understand code flow without needing to instrument anything.
  • LogpointYour most powerful investigation tool. Evaluates any JavaScript expression at a specific source line, every time that line executes. Returns the expression value at each hit with timestamps. This is equivalent to adding console.log statements retroactively — unlimited, no rebuild, no reproduction needed.
    • Use beginPoint/endPoint to filter hits to a specific time window (critical for high-frequency functions)
    • Build complex string expressions to capture multiple values in one logpoint: lanes=${lanes} status=${status} pinged=${root.pingedLanes}
    • Place logpoints at key decision points in scheduling functions to trace mutable state over time
  • DescribePoint / Evaluate — Inspect full program state at a specific execution point. Use after Logpoint identifies an interesting moment — drill in to see all variables, the call stack, and dependency chains.
  • Screenshot — Confirm visual state at timestamps. Use at the end of each recording to verify the bug visually.

Investigation pattern: Overview both recordings → Compare render commits → Read source to understand the algorithm → Instrument with Logpoints to trace state through the critical window → Find where dev and prod state diverges → Explain the mechanism.

Deliverable

Write a detailed bug report with:

  1. Bug Description — What happens
  2. Root Cause Analysis — What exactly causes the deferred value to get stuck in prod but not dev
  3. Suggested Fix — What code change would fix this
  4. Key Evidence Trail — Table of tools used and findings at each step
Agent #4 final analysis

Bug Report: React useDeferredValue Stuck in Production — Race Condition in pingSuspendedRoot

1. Bug Description

When using useDeferredValue with a Promise<ReactNode> (fed by a server action) inside <Suspense>, the deferred output permanently stops updating in production builds. The final typed character never renders — the deferred value gets stuck one character behind the input. This does not happen in dev mode.

In the investigated case, the user typed "hello world from prod" but the output is permanently stuck showing "hello world from pro".

2. Root Cause Analysis

The bug is a race condition in React's pingSuspendedRoot function in react-dom-client.production.js. It only manifests in production because production builds use concurrent time-slicing, which allows microtasks (promise .then() callbacks) to fire during an active render.

The Flow That Breaks

  1. User types the final character ('d'). React starts rendering the deferred lane (lane 524288 = 0x80000).
  2. During rendering, the component suspends (throws a Promise) because the server action response hasn't arrived yet. React sets workInProgressRootExitStatus = 4 (RootSuspended) but continues in the render context (executionContext = 2).
  3. The server action response arrives at ~5155ms. The promise resolves, firing the .then() ping callback at 5169ms — while the render is still active.
  4. pingSuspendedRoot fires mid-render with this state:
    • workInProgressRoot === root → TRUE (active render)
    • workInProgressRootExitStatus === 4 (suspended)
    • executionContext & 2 !== 0 (mid-render)
    • root.suspendedLanes === 0 (lane hasn't been marked suspended on root yet — that happens later in markRootSuspended)
  5. Three things fail simultaneously:
    • root.pingedLanes not set: Line 13076 does root.pingedLanes |= root.suspendedLanes & pingedLanes, but root.suspendedLanes is still 0 (not yet marked), so this is a no-op.
    • prepareFreshStack blocked: The ternary at lines 13080-13086 takes the ? branch (because exitStatus === 4), but prepareFreshStack is guarded by 0 === (executionContext & 2) which is false (mid-render). Result: the expression evaluates to false and nothing happens.
    • workInProgressRootPingedLanes not set: The : branch (workInProgressRootPingedLanes |= pingedLanes) is only reached when exitStatus is NOT 4 or 3+fallback. Since exitStatus IS 4, we're on the ? branch, and this line never executes.
  6. 5191ms — markRootSuspended runs after the render completes:
    • suspendedLanes = 524288
    • workInProgressRootPingedLanes = 0 (the ping was lost!)
    • Line 12147: suspendedLanes &= ~0524288 (no exclusion)
    • Line 12149: root.suspendedLanes |= 524288 → lane marked suspended
    • Line 12150: root.pingedLanes &= ~524288 → any ping bit cleared
    • Line 12151: root.warmLanes |= 524288 → lane added to warm lanes
    • Result: Lane 524288 is permanently suspended and warm, with no ping to trigger a retry.

Why Dev Mode Works

In dev mode, React is slower due to extra validation, warnings, and StrictMode double-renders. It doesn't yield to the browser mid-render as often. The Replay recording confirms this: the condition workInProgressRoot === root inside pingSuspendedRoot has 0 hits in dev. The ping .then() callback always fires after the render completes, so pingSuspendedRoot takes the early-exit path (no active render) and ensureRootIsScheduled correctly reschedules the retry.

3. The Exact Bug in Code

File: packages/react-reconciler/src/ReactFiberWorkLoop.js (in the React repo)

Function: pingSuspendedRoot

Prod bundle location: react-dom-client.production.js lines 13078-13088

// Current code (simplified):
if (workInProgressRoot === root &&
    (workInProgressRootRenderLanes & pingedLanes) === pingedLanes) {

  if (exitStatus === RootSuspended || (exitStatus === RootSuspendedWithDelay && ...)) {
    // TRUE branch: try to restart
    if ((executionContext & RenderContext) === 0) {
      prepareFreshStack(root, 0);  // Can't do this mid-render!
    }
    // If mid-render, NOTHING happens — ping is silently dropped
  } else {
    // FALSE branch: record the ping for later
    workInProgressRootPingedLanes |= pingedLanes;  // Never reached when exitStatus===4
  }
}

The bug: When exitStatus === 4 (suspended) AND executionContext has the render bit set (mid-render), neither prepareFreshStack nor workInProgressRootPingedLanes |= pingedLanes executes. The ping is completely lost.

4. Suggested Fix

When the render has suspended and we're mid-render (can't restart), we should still record the ping in workInProgressRootPingedLanes so that markRootSuspended will exclude the pinged lane from being marked suspended:

// In pingSuspendedRoot, the ternary handling for exitStatus === 4:
if (exitStatus === RootSuspended || (exitStatus === RootSuspendedWithDelay && ...)) {
  if ((executionContext & RenderContext) === 0) {
    prepareFreshStack(root, 0);
  } else {
    // FIX: Record the ping even when we can't restart mid-render.
    // markRootSuspended will check workInProgressRootPingedLanes
    // and exclude this lane from being marked suspended.
    workInProgressRootPingedLanes |= pingedLanes;
  }
} else {
  workInProgressRootPingedLanes |= pingedLanes;
}

This ensures that when markRootSuspended runs at line 12147 (suspendedLanes &= ~workInProgressRootPingedLanes), the pinged lane is excluded and won't be permanently stuck. The subsequent ensureRootIsScheduled (line 13089, which always runs) will then correctly schedule a retry render.

5. Key Evidence Trail

Step Tool Recording Finding
1 RecordingOverview Both Dev: 125 commits, 63 POSTs. Prod: 38 commits, 21 POSTs. 3x fewer renders in prod.
2 Screenshot Both Dev shows "hello world from dev" (complete). Prod shows "hello world from pro" (stuck).
3 ReactRenders (commits) Prod Last commit #37 is SetState for final keystroke. No SuspenseResumed follows — missing final commit.
4 ReactRenders (commits) Dev Last commits #123→#124 show SetState→SuspenseResumed. Pattern completes.
5 NetworkRequest (detail) Prod POST #36 at 5149ms sends "hello world from prod", gets 200 OK. Response arrives at 5155ms with valid RSC data. Data is fine.
6 SearchSources + ReadSource Prod Located pingSuspendedRoot (L13073), markRootSuspended (L12141), prepareFreshStack (L12184).
7 Logpoint + Evaluate Prod At Point:114 (5169ms ping): exitStatus=4, executionContext=2, root.suspendedLanes=0, workInProgressRootPingedLanes=0. Confirmed ping is lost.
8 Logpoint + Evaluate Prod At Point:118 (5191ms markRootSuspended): workInProgressRootPingedLanes=0, lane 524288 permanently marked suspended+warm.
9 ReadSource (hit counts) Dev pingSuspendedRoot L10047: workInProgressRoot === root has 0 hits. Dev pings never fire mid-render. Confirms timing difference is the mechanism.
10 ReadSource (hit counts) Prod pingSuspendedRoot L13087: 4 hits on the mid-render path. L13085 (prepareFreshStack): 0 hits. Confirmed: entered the ternary 4 times but prepareFreshStack never executed.

6. Red-Team Analysis

Does this explain why it's prod-only? Yes. The race requires time-slicing (concurrent rendering with yields), which only happens in production. Dev mode's extra overhead means renders don't yield, so promise callbacks fire after renders complete.

Is this a React bug or an app bug? React bug. The app correctly uses useDeferredValue + Suspense + server actions. The race is entirely within React's scheduler.

Could this affect other patterns? Yes. Any pattern where a promise resolves during an active concurrent render of the same lane — useDeferredValue, lazy(), data fetching with Suspense — could theoretically hit this. The useDeferredValue + server action pattern is particularly susceptible because server action responses arrive asynchronously at unpredictable times.

Is the fix complete? The suggested fix handles the specific case where exitStatus === 4 && (executionContext & 2) !== 0. An alternative fix could be in markRootSuspended itself (checking root.pingedLanes as a fallback), but fixing pingSuspendedRoot at the point where the ping is dropped is cleaner and more direct.

Analyzing the Agent Results

I went into this hoping that having Replay recordings available would help prove that Replay runtime data makes it easier to solve hard bugs. I’ve certainly experienced that myself just working on Replay and using it over the last few years! So it was extremely satisfying to see that every single one of the agent runs was able to find this complex React bug and propose the correct fix, based only on the Replay recordings of the bug!

As Dan noted: his own agent wasn’t able to solve the issue after “a month and a ton of tokens”. It wasn’t until Andrew Clark pointed him in the right direction that his agent was able to add the right log calls to React’s source, rebuild, analyze the logs, repeat the process, and eventually converge on the right answer.

Replay MCP’s tools gave my agents the ability to analyze the runtime behavior without having to keep rebuilding React! The Logpoint tool allowed agents to dynamically evaluate an expression every time a line of code ran, while the Sources tool and its built-in “hit counters per line” values acted as an impromptu profiler to help guide the investigation.

Comparing Agent Behaviors

Looking at the prompts and the results, I see a few key differences:

Dimension Run 1 (Baseline) Run 2 (+Method) Run 3 (+Context) Run 4 (+Tools)
Duration ~28 min ~17 min ~8 min ~7 min
Messages 154 111 47 42
Context compressions 6 4 2 2
Found ternary bug
Fix matches PR #36134 ✅ (Fix A of 3) Partial (Fix B)
Found isThenableResolved gap ✅ (unique)
Proved sync call chain ✅ (GetStack) ✅ (ReadSource)
Quantitative hit counts ✅ (best)
Dev vs prod comparison
False lead time ~8 min ~4 min ~1 min <1 min
Red-team analysis Moderate Moderate Light Best
Number of fixes proposed 3 2 1 1
Bottom-up exploration Extensive Moderate Minimal Minimal

Replay recordings and MCP tools were sufficient to let all of my agents find and fix the bug, in less than half an hour.

Even with the most basic instructions, just having the recording and MCP tools was enough to let Agent #1 eventually figure out the root cause and propose a valid solution for an otherwise unsolvable bug.

Modern agents and models are extremely powerful and capable

We’ve all seen this over the last year, but it’s still incredible to me that an AI can dive into a codebase or problem space and orient itself just by reading some files and produce real value.

Prompting and context are still critical

Just giving a more detailed step-by-step investigation pattern cut the investigation time in half. A few paragraphs of “here’s some concepts of what React’s scheduling internals involve” cut it in half again. And as I’ve seen in some other investigations I’ll talk about in a future post, having some relevant skills files available can produce drastically better investigation results.

Replay MCP’s tools are exactly the solution Dan Abramov described

Dan said his agent “instrumented the React codebase with logs”, and really needed “information over time”. That’s exactly what Replay MCP provided! Tools like RecordingOverview and ReactRenders to surface info on React behavior and error messages; Logpoint and Evaluate to extract specific values at various points in time; Sources to view source files in the bundle and see the hit counts to understand execution; Screenshot to visually inspect the UI at a given point in time; NetworkRequests to check the requests and results; and even more niche tools like GetStack and DescribePoint to inspect the JS execution flow.

Replay MCP provides all these and more, enabling agents to actually understand the runtime behavior over time, without having to rebuild the app with more logging. Capture the recording once with all the tricky timing behavior, investigate as deeply as needed, automatically.

What do Replay MCP’s tools look like?

Here’s some of the actual MCP tool calls the agents ran during this investigation:

RecordingOverview

# Recording Overview

## Recording Info & Environment

- **URL:** http://localhost:3000/
- **Duration:** 10.3s
- **React:** v19.3.0-canary-f93b9fd4-20251217 (production)
- **Libraries:** NextJS, React

## Console Errors & Warnings

App healthy  no console errors or warnings detected.

## Screenshot

**Final recording state**  screenshot available at timestamp 9955ms.
Use the **Screenshot** tool with `timestamp: 9955` to view it.

## Network Summary

- **Total requests:** 37
- **Failed:** 1
- **Slow:** 0

### API Patterns
- `POST /`  21 requests
- `GET /`  1 request

## React Render Summary

- **Total commits:** 38
- **Total render time:** 107ms

### Costliest Commits
- Commit #24: 54ms, 4 fibers  SuspenseResumed
- Commit #13: 4ms, 4 fibers  SuspenseResumed
- Commit #22: 4ms, 4 fibers  SuspenseResumed
- Commit #19: 4ms, 4 fibers  SuspenseResumed
- Commit #9: 4ms, 4 fibers  SuspenseResumed

## Suggested Next Steps

1. **NetworkRequest**  Investigate failed network requests
   _1 failed request_
2. **ReactRenders**  Analyze React render performance
   _38 commits, 107ms total render time_
3. **Screenshot**  View screenshot of the application state
   _Final recording state at 9955ms_

ReactRenders - List

# Commits

|  # |  Time | Duration | Fibers | Mutating | Wasted | Waste% | Trigger         |
| -: | ----: | -------: | -----: | -------: | -----: | -----: | --------------- |
| 28 | 4.52s |    3.3ms |      4 |        1 |      3 |  75.0% | SuspenseResumed |
| 29 | 4.56s |     15µs |      2 |        1 |      1 |  50.0% | SetState        |
| 30 | 4.66s |    1.6ms |      3 |        1 |      2 |  66.7% | SetState        |
| 31 | 4.67s |    3.3ms |      4 |        1 |      3 |  75.0% | SuspenseResumed |
| 32 | 4.76s |      4µs |      2 |        1 |      1 |  50.0% | SetState        |
| 33 | 4.77s |    3.3ms |      4 |        1 |      3 |  75.0% | SuspenseResumed |
| 34 | 4.83s |      4µs |      2 |        1 |      1 |  50.0% | SetState        |
| 35 | 5.07s |    3.3ms |      4 |        1 |      3 |  75.0% | SetState        |
| 36 | 5.08s |    3.3ms |      4 |        1 |      3 |  75.0% | SuspenseResumed |
| 37 | 5.15s |      4µs |      2 |        1 |      1 |  50.0% | SetState        |

Showing 29–38 of 38

## Next Steps

- Use mode=commit&commitIndex=N to inspect a specific commit
- Use mode=summary to see the recording overview

ReactRenders - Commit

# Commit #37

**Timestamp:** 5.15s
**Render duration:** 4µs
**Fibers rendered:** 2 (1 mutating, 1 wasted  50.0%)

## Triggers

- **SetState**
  - Source: TestPreviewClient (client.tsx:38:25)
  - Point: 11682667932122379955827658622763010

## Components

| Component         | Instances | Renders | Wasted | Duration | DOM Muts |
| ----------------- | --------: | ------: | -----: | -------: | -------: |
| Resolved          |         1 |       1 |      1 |      4µs |        0 |
| TestPreviewClient |         1 |       1 |      0 |      0µs |        1 |

## DOM Mutations (1 total)

- TestPreviewClient: 1

## Next Steps

- Use mode=component&componentName=X to investigate a specific component
- Use mode=commits to browse other commits
- Use mode=summary to see the recording overview

ReadSource

 hits | L | code 
------|------|------
   0 | 12116 |       for (var i = 0; i < tag.length; i++) { 
   0 | 12117 |         var check = tag[i], 
   0 | 12118 |           getSnapshot = check.getSnapshot; 
   0 | 12119 |         check = check.value; 
     | 12120 |         try { 
   0 | 12121 |           if (!objectIs(getSnapshot(), check)) return !1; 
     | 12122 |         } catch (error) { 
   0 | 12123 |           return !1; 
     | 12124 |         } 
     | 12125 |       } 
     | 12126 |     tag = node.child; 
  87 | 12127 |     if (node.subtreeFlags & 16384 && null !== tag) 
   0 | 12128 |       (tag.return = node), (node = tag); 
     | 12129 |     else { 
  87 | 12130 |       if (node === finishedWork) break; 
   0 | 12131 |       for (; null === node.sibling; ) { 
   0 | 12132 |         if (null === node.return || node.return === finishedWork) return !0; 
   0 | 12133 |         node = node.return; 
     | 12134 |       } 
   0 | 12135 |       node.sibling.return = node.return; 
   0 | 12136 |       node = node.sibling; 
     | 12137 |     } 
     | 12138 |   } 
  87 | 12139 |   return !0; 
  87 | 12140 | } 
     | *12141 | function markRootSuspended( 
     | 12142 |   root, 
     | 12143 |   suspendedLanes, 
     | 12144 |   spawnedLane, 
     | 12145 |   didAttemptEntireTree 
     | 12146 | ) { 
  67 | 12147 |   suspendedLanes &= ~workInProgressRootPingedLanes; 
  67 | 12148 |   suspendedLanes &= ~workInProgressRootInterleavedUpdatedLanes; 
  67 | 12149 |   root.suspendedLanes |= suspendedLanes; 
  67 | 12150 |   root.pingedLanes &= ~suspendedLanes; 
  67 | 12151 |   didAttemptEntireTree && (root.warmLanes |= suspendedLanes); 
  67 | 12152 |   didAttemptEntireTree = root.expirationTimes; 
 134 | 12153 |   for (var lanes = suspendedLanes; /*<HIT>*/0 < lanes; ) { 
  67 | 12154 |     var index$6 = 31 - clz32(lanes), 
  67 | 12155 |       lane = 1 << index$6; 
  67 | 12156 |     didAttemptEntireTree[index$6] = -1; 
  67 | 12157 |     lanes &= ~lane; 
     | 12158 |   } 
  67 | 12159 |   0 !== spawnedLane && 
   0 | 12160 |     markSpawnedDeferredLane(root, spawnedLane, suspendedLanes); 
     | 12161 | } 
     | 12162 | function flushSyncWork$1() { 
  21 | 12163 |   return 0 === (executionContext & 6) 
  21 | 12164 |     ? (flushSyncWorkAcrossRoots_impl(0, !1), !1) 
     | 12165 |     : !0; 
     | 12166 | }

Logpoint

Found 46 total hits at turbopack:///[project]/node_modules/.pnpm/[email protected][email protected][email protected][email protected]/node_modules/next/dist/compiled/react-dom/cjs/react-dom-client.production.js:13076 (showing first 20)

Hit 1 at point Point:239 @ 2676.17ms:
  Expression: ["ping", pingedLanes, "suspended", root.suspendedLanes, "renderLanes", workInProgressRootRenderLanes, "exit", workInProgressRootExitStatus, "isRoot", workInProgressRoot === root, "execCtx", executionContext]
  Result: [
  "ping",
  524288,
  "suspended",
  524800,
  "renderLanes",
  0,
  "exit",
  4,
  "isRoot",
  false,
  "execCtx",
  0
]

Hit 2 at point Point:240 @ 2676.21ms:
  Expression: ["ping", pingedLanes, "suspended", root.suspendedLanes, "renderLanes", workInProgressRootRenderLanes, "exit", workInProgressRootExitStatus, "isRoot", workInProgressRoot === root, "execCtx", executionContext]
  Result: [
  "ping",
  512,
  "suspended",
  524800,
  "renderLanes",
  0,
  "exit",
  4,
  "isRoot",
  false,
  "execCtx",
  0
]

Hit 3 at point Point:241 @ 2841.09ms:
  Expression: ["ping", pingedLanes, "suspended", root.suspendedLanes, "renderLanes", workInProgressRootRenderLanes, "exit", workInProgressRootExitStatus, "isRoot", workInProgressRoot === root, "execCtx", executionContext]
  Result: [
  "ping",
  1048576,
  "suspended",
  1049600,
  "renderLanes",
  0,
  "exit",
  4,
  "isRoot",
  false,
  "execCtx",
  0
]

// snip

Replay MCP: The Time-Travel Superpower Your Agents Need

I joined Replay because I saw the potential and promise of time-travel debugging and how it can make the debugging process drastically faster and better. It enables solving bugs that are otherwise impossible, and provides insights into real runtime behavior that no other monitoring or analysis tool can provide.

Replay DevTools gave humans the ability to investigate and solve bugs with time travel.

Now, Replay MCP gives your agents the same time-travel superpowers.

You can add Replay MCP to your own agents and workflows today! Plug it in, make a recording of a bug or a failed test, and let your agent do the time-travel investigative work for you.

And, coming soon: we’re working on expanding our existing E2E Playwright and Cypress recording integrations to automatically investigate test failures and provide analysis and recommended fixes! This will help your team fix flaky tests, ensure PR code quality, and improve shipping velocity.

Try out Replay MCP and our E2E Test Suites integrations today, and stay tuned - we’ve got a lot more awesome time-travel debugging capabilities coming soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment