usirin/research.md

Created April 12, 2026 22:37

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/usirin/bd941acd4fa0b04fb1d5e581c461301b.js"></script>
Save usirin/bd941acd4fa0b04fb1d5e581c461301b to your computer and use it in GitHub Desktop.

Download ZIP

Research: XState Nested and Parallel State Machine Patterns for Workflow Orchestration

Raw

research.md

title

XState Nested and Parallel Patterns for Workflow Orchestration

date

2026-04-12

type

research

status

complete

parent

kampus/xstate-nested-parallel-patterns

cross-references

kampus/parallelism-preserving-ui-for-agents/research-v2-engineer-forward.md

kampus/ds-as-claude-md/research.md

kampus/claude-code-parallel-primitives/research.md

XState Nested and Parallel Patterns for Workflow Orchestration

Executive Summary

The operator's current workflow.json schema -- a flat map of independent task state machines, each running its own do-work -> qa -> passed/tripped cycle -- is a correct but limited model. It works because every task is modeled as an island. No task knows about any other task. The state-ledger CLI iterates over them one at a time. The operator agent picks one, does it, transitions it, picks another. This is sequential execution with extra bookkeeping.

The limitation is architectural, not functional. The flat map cannot express "these three tasks are independent and should run concurrently" versus "this task depends on that task and must wait." It cannot express "all research subtasks form a parallel phase, and the consolidation task is a sequential successor gated on their joint completion." The operator agent encodes these relationships in its own prompt instructions -- in natural language, in its head -- rather than in the state machine that is supposed to be the source of truth. Every time the operator makes a sequencing decision, it is doing work that the machine should be doing for it.

XState's parallel and compound state primitives solve this directly. A parallel state node activates all child regions simultaneously and gates its onDone transition on their joint completion -- fork-join concurrency as a declarative primitive. Compound (nested) states decompose a single task's lifecycle into substates that share context and event handling. Invoked actors provide hard boundaries when tasks need independent contexts and lifecycle management. The pattern that maps to the operator's needs is straightforward: the top-level machine is sequential (phase1 -> phase2 -> phase3), specific phases are parallel (all research subtasks run concurrently within phase1), and individual tasks within parallel regions are either compound states or invoked submachines depending on whether they need isolation.

The migration is viable but demands respect for XState's sharp edges. The onDone transition on parallel states has been the single most bug-reported feature in XState's history, with at least four correctness violations shipped across versions. History state persistence is broken when round-tripping through JSON. Schema evolution for persisted snapshots is completely unaddressed -- no versioning, no migration path, no compatibility guarantees. The recommendation is clear: use parallel states for the top-level concurrency concern, keep nesting shallow (two levels maximum), avoid history states in parallel regions, pin your XState version, and test onDone completion aggressively. The state-ledger API must evolve from returning flat string state values to returning compound state value objects, and the CLI must learn to display nested state trees.

The prize is worth the cost. A single XState machine replaces the flat task map plus the operator's implicit sequencing logic. The machine becomes the authority not just for individual task state but for workflow topology. Persistence comes free via getPersistedSnapshot(). The operator agent becomes simpler because the machine handles the "what can run next" question, and the agent only handles the "how to do the work" question. The separation of concerns is exactly right.

1. Parallel State Snapshots and Serialization

The foundational insight is that XState's parallel state value is not a flat bag of flags. It is a recursively nested object that mirrors the machine's hierarchical structure, and that structure IS the runtime representation of concurrent state.

A parallel state node (type: "parallel") activates every direct child region simultaneously upon entry. There is no initial property -- every region starts. Events broadcast to all regions. Context is shared across all regions. And the onDone transition fires if and only if every child region has reached a type: "final" state. This is fork-join concurrency modeled declaratively, and it is the primitive the operator needs for concurrent task groups.

The state value for a machine with a parallel region embedded in a sequential parent looks like this:

{
  "running": {
    "trackA": "task1",
    "trackB": "task3"
  }
}

Leaf values are always strings. The nesting is fully recursive -- parallel inside sequential inside parallel produces deeper objects. This maps directly onto the operator's task structure: each task's current status string becomes a leaf in the compound state value.

For querying, snapshot.hasTag() is the most resilient primitive. Tag each task state with its semantic status ('pending', 'running', 'complete', 'failed') and query by tag. This decouples consuming code from the machine's structural hierarchy entirely. If you refactor the nesting, tags still work. snapshot.matches() is the workhorse for specific state checks, supporting partial matching: snapshot.matches({ running: { trackA: 'task1' } }) returns true regardless of trackB's state.

Persistence is clean. getPersistedSnapshot() produces a five-field JSON object: status, value, historyValue, context, and children. The persisted snapshot is a minimal diff against the machine definition -- you store only leaf state strings and context, not the entire state graph. Restoration via createActor(machine, { snapshot: persisted }) works correctly and starts processing events from the persisted position. For the operator, this means: persist the snapshot, reconstruct everything else from the machine definition at startup.

2. Compound States vs. Invoked Actors: Two Composition Models

XState gives you two fundamentally different ways to nest behavior, and choosing wrong will cost you.

Compound states are hierarchy without boundaries. Child states share the parent's context, event bus, and lifecycle. Event bubbling means a CANCEL handler on the parent applies to every child for free. The state value nests accordingly: { preparation: 'grinding' }. This is the right tool when nested states are logically part of the same process -- a task's do-work -> qa -> passed substeps are one thing viewed at different zoom levels. The litmus test: can the child make sense without the parent? If no, use a compound state.

Invoked actors are real boundaries with real independence. An invoked actor has its own context, its own event processing, and its own lifecycle tied to the invoking state. The parent only sees "done" or "error" -- the child's internal states are opaque. Communication happens via explicit message passing with sendTo(), replacing v4's implicit sendParent(). The litmus test: does the parent need to observe the child's internal states? If no, invoke it.

For the operator's workflow, the decision maps cleanly. The overall workflow phases (research, implementation, review) are compound states -- they are one process with substeps. Individual tasks within a parallel research phase could be either compound states (if the operator needs to observe their internal substates for status display) or invoked actors (if the operator only cares about done/failed). The recommendation is compound states for tasks within parallel regions, because the operator's status display explicitly shows task substates (do-work, qa, passed, tripped), which requires the parent machine to see into the child's state.

The setup() pattern is the correct way to declare actors in v5. It registers actors by name, giving you type safety, centralized configuration, and testability. The inline approach works but loses type inference and is a code smell in anything non-trivial.

3. Mixing Parallel and Sequential: The Architecture Pattern

Here is the key architectural insight most tutorials miss: you do not add sequencing to a parallel machine. You build a sequential machine and embed parallel regions where concurrency is needed.

Sequential is the skeleton; parallel is the organ. The top-level states define the pipeline order. Parallelism is nested inside specific steps. The canonical pattern for "tasks 1+2 in parallel, then task 3, then tasks 4+5 in parallel" is three sequential phases where phases 1 and 3 happen to be type: "parallel":

phase1 (parallel: task1 + task2) -> phase2 (sequential: task3) -> phase3 (parallel: task4 + task5) -> complete

The onDone transition on each parallel phase is the barrier. It fires when all regions reach their final states. There is no race condition, no "what if both finish at the same time" edge case. The statechart formalism handles simultaneous completion by definition. This is Promise.all with structure, visibility, and inspectability.

Each region within a parallel state is a full state machine. It can have its own sequential substates, its own nested parallel states (though exercise restraint -- more than two levels of nesting and debugging becomes impossible), its own invocations. The nesting is recursive, and so is the onDone bubbling.

Error handling in parallel phases requires an explicit design decision. If a track's failed state has type: "final", the track is "done" (albeit with an error) and onDone fires once all tracks complete. If failed is NOT final, the workflow deadlocks until something retries. Leaving failed as a non-final state without a recovery path is the single most common bug in parallel XState workflows. For the operator, the right choice is: failed/tripped states are final, and the phase's onDone transition uses a guard to check context for errors before proceeding to the next phase. This preserves the circuit breaker pattern -- a tripped task terminates its track but does not block the entire phase.

Cross-region coordination uses three mechanisms: shared context (all regions read/write the same context -- use distinct keys per track to avoid conflicts), event broadcasting (every event goes to all regions -- this is the primary coordination primitive), and stateIn() guards (check a sibling region's state -- use sparingly, as tight coupling between regions defeats the purpose of parallelism).

4. Limitations and Gotchas

XState's parallel and nested primitives are production-ready for shallow, well-separated concerns with disjoint event vocabularies. They degrade along every axis simultaneously as complexity increases. The failure modes that matter for the operator:

onDone has been broken repeatedly. At least four correctness bugs shipped: premature firing when the first (not all) regions completed (#326), history nodes counting as regions that never finish (#3170), nested parallel states not bubbling done events (#2349), and various cases of onDone simply not firing (#1111). The nested parallel completion bug was only fixed in PR #4358. If the operator's architecture depends on parallel onDone for correctness -- and it will -- this must be tested aggressively against the pinned XState version.

History state persistence is broken. Issue #5178: getPersistedSnapshot() serializes historyValue as plain objects, but deserialization does not correctly reconstruct StateNode references. History transitions silently route to initial states instead of remembered states. The failure is silent -- no error, just wrong behavior. The operator's current workflow.json uses type: "history" for the blocked -> unblocked flow. This is a direct conflict. Either remove history states from the parallel machine design, or avoid the JSON round-trip and pass snapshot objects directly.

Schema evolution is unaddressed. There is no versioning, migration, or compatibility story for persisted snapshots. If the machine definition changes -- add a state, remove a state, rename a region -- and a snapshot from the old definition is restored, behavior is undefined. Discussion #4828 explicitly calls out the absence of documentation on this. For the operator, which persists workflow-state.json to disk and may need to evolve the schema across sessions, this is the primary operational risk. The mitigation is a version field in workflow.json and explicit snapshot invalidation on version bump.

Event broadcasting is a footgun. All parallel regions receive all events. If two regions handle the same event type differently, you must disambiguate with guards or distinct event types. Neither scales. The operator's event vocabulary (DONE, FAIL, PASS, BLOCKED) is generic -- sending DONE to a parallel state would transition every region that handles DONE. The fix is namespaced events: TASK_1.DONE, TASK_2.DONE. This is manual routing with extra steps, but it is the only correct approach.

Tooling gives out around 3 levels of nesting. The Stately visualizer breaks on compound states with mixed parallel/non-parallel children. The inspector chokes at ~100KB of stringified context. TypeScript type inference slows dramatically with deep nesting (microsoft/TypeScript#39826). The practical ceiling is 2-3 levels before you lose the ability to inspect, visualize, or get fast IDE feedback.

State explosion is implicit. Three parallel regions with 3 states each = 27 possible combinations. XState provides no way to declare that certain combinations are invalid. Guards are runtime checks, not compile-time guarantees. Invalid states must be prevented by careful event handling, not by the type system.

5. Proposed workflow.json Schema

The current flat schema:

{
  "id": "feature-name",
  "version": 1,
  "tasks": {
    "task_1": { "initial": "do-work", "states": { ... } },
    "task_2": { "initial": "do-work", "states": { ... } }
  }
}

The proposed parallel-aware schema:

{
  "id": "feature-name",
  "version": 2,
  "generated": "2026-04-12",
  "machine": {
    "id": "feature-name",
    "initial": "phase1",
    "context": {
      "results": {},
      "errors": []
    },
    "states": {
      "phase1": {
        "type": "parallel",
        "states": {
          "task_1": {
            "initial": "do-research",
            "context": { "retries": 0, "maxRetries": 3 },
            "states": {
              "do-research": {
                "on": {
                  "TASK_1.DONE": "qa-research",
                  "TASK_1.BLOCKED": "blocked",
                  "TASK_1.TRIPPED": "tripped"
                }
              },
              "qa-research": {
                "on": {
                  "TASK_1.PASS": "passed",
                  "TASK_1.FAIL": [
                    { "target": "do-research", "guard": "retriesRemaining", "actions": "incrementRetries" },
                    { "target": "tripped" }
                  ]
                }
              },
              "blocked": {
                "on": { "TASK_1.UNBLOCKED": "do-research" }
              },
              "passed": { "type": "final" },
              "tripped": { "type": "final" }
            }
          },
          "task_2": {
            "initial": "do-research",
            "states": {
              "do-research": {
                "on": {
                  "TASK_2.DONE": "qa-research",
                  "TASK_2.TRIPPED": "tripped"
                }
              },
              "qa-research": {
                "on": {
                  "TASK_2.PASS": "passed",
                  "TASK_2.FAIL": [
                    { "target": "do-research", "guard": "retriesRemaining", "actions": "incrementRetries" },
                    { "target": "tripped" }
                  ]
                }
              },
              "passed": { "type": "final" },
              "tripped": { "type": "final" }
            }
          }
        },
        "onDone": [
          { "target": "phase2", "guard": "noErrors" },
          { "target": "tripped" }
        ]
      },
      "phase2": {
        "initial": "do-consolidation",
        "states": {
          "do-consolidation": {
            "on": {
              "CONSOLIDATION.DONE": "qa-consolidation",
              "CONSOLIDATION.TRIPPED": "tripped"
            }
          },
          "qa-consolidation": {
            "on": {
              "CONSOLIDATION.PASS": "passed",
              "CONSOLIDATION.FAIL": [
                { "target": "do-consolidation", "guard": "retriesRemaining", "actions": "incrementRetries" },
                { "target": "tripped" }
              ]
            }
          },
          "passed": { "type": "final" },
          "tripped": { "type": "final" }
        },
        "onDone": "complete"
      },
      "complete": { "type": "final" },
      "tripped": { "type": "final" }
    }
  }
}

Key design decisions:

Single machine, not a flat map. The entire workflow is one XState machine definition under machine. The state-ledger builds and runs this machine directly.
Namespaced events. TASK_1.DONE instead of DONE. This prevents event broadcasting from transitioning unintended regions.
History states removed. The blocked -> hist pattern from the current schema is replaced with blocked -> do-research (explicit target). This avoids the history state serialization bug (#5178) entirely.
tripped is final. Both at the task level (within a parallel region) and at the workflow level. A tripped task completes its region, allowing the parallel phase's onDone to fire. The guarded onDone transition checks for errors before proceeding.
Phase-level circuit breaker. The onDone guard on phase1 checks context.errors. If any task tripped, the workflow itself trips rather than proceeding to phase2 with incomplete inputs.
Version field bumped to 2. This signals the schema change and triggers snapshot invalidation in the state-ledger.

6. State-Ledger API Changes

The current Ledger interface assumes flat string state values:

interface TaskStatus {
  state: string;       // flat: "do-work", "qa", "passed"
  retries: number;
  maxRetries: number;
  final: boolean;
}

The new interface must handle compound state values:

// The state value is now the XState StateValue type --
// a string for atomic states, a nested object for compound/parallel states
type StateValue = string | { [key: string]: StateValue };

interface WorkflowStatus {
  value: StateValue;           // e.g., { phase1: { task_1: "do-research", task_2: "qa-research" } }
  status: 'active' | 'done' | 'error' | 'stopped';
  context: Record<string, unknown>;
  tasks: Record<string, {      // derived: flatten the parallel regions into per-task status
    state: string;             // leaf state string for this task
    final: boolean;
  }>;
}

interface TransitionResult {
  previous: StateValue;
  event: string;
  current: StateValue;
  taskAffected: string;        // which task actually transitioned (derived from event namespace)
}

interface Ledger {
  status(): Promise<WorkflowStatus>;
  transition(event: string): Promise<TransitionResult>;  // no taskId needed -- event namespace routes it
  history(): Promise<TransitionHistoryEntry[]>;           // workflow-level history, not per-task
  tasks(): Promise<Record<string, { state: string; final: boolean }>>;  // convenience: flat task view
}

The critical changes:

transition() takes an event, not a taskId + event. The event namespace (TASK_1.DONE) routes to the correct region. The state-ledger no longer needs to know which task to target -- the machine handles routing.
status() returns compound state values. The caller must understand nested objects. A convenience tasks() method provides the flat view by walking the state value tree.
stateValue() helper must handle objects. The current implementation JSON.stringifys non-string values. The new implementation should return the raw StateValue and let callers decide how to display it.
History becomes workflow-level. With a single machine, transition history is a single ordered log. Per-task history is derivable by filtering on event namespace.
The --task flag on the CLI becomes optional. state-ledger transition --dir <path> --event TASK_1.DONE replaces state-ledger transition --dir <path> --task task_1 --event DONE. The old form can be sugar that prepends the task namespace.

7. Gotcha Avoidance Checklist

These are not theoretical concerns. Each rule is derived from a documented bug, a silent failure mode, or an empirical degradation threshold.

Pin your XState version. The onDone + parallel state combination has been broken, re-broken, and partially fixed across at least four major issues. Do not assume an upgrade will not regress this. Lock the version in package.json with an exact version, not a range.
Do not use history states in parallel regions. History state persistence is broken (Issue #5178). The JSON round-trip corrupts historyValue, causing silent fallback to initial states. Use explicit transition targets instead of history nodes.
Namespace all events in parallel regions. Events broadcast to all regions. DONE sent to a parallel state transitions every region that handles DONE. Use TASK_1.DONE, TASK_2.DONE to route events to specific tracks.
Keep nesting to 2 levels maximum. XState imposes no depth limit, but tooling (visualizer, inspector, TypeScript inference) degrades at 3+ levels. Parallel inside sequential is one level. Sequential inside parallel inside sequential is two. Stop there.
Make failed/tripped states final. If a failed state is not type: "final", the parallel parent's onDone never fires for that region. The workflow deadlocks. Always make error terminal states final, and use guarded onDone transitions to check for errors before proceeding.
Give each parallel track its own context key. All regions share one context object. If two regions assign to the same key, the last one wins. Use context.results.task_1, context.results.task_2 -- no overlap, no surprises.
Do not rely on raise for cross-region communication. raise in a parallel state transition does not always propagate to sibling regions in the same microstep (Discussion #4456). Use sendTo targeting the actor itself for reliable delivery.
Do not target sibling region states. Cross-region transitions (from one parallel region to a state in a sibling region) are not reliably supported (Issue #518). The exit action fires but the state change does not occur. Each region must be self-contained.
Version your workflow.json. There is no schema evolution story for persisted snapshots. If the machine definition changes and an old snapshot is restored, behavior is undefined. Bump the version field on any machine change and invalidate stale snapshots.
Double-serialize for JSON safety. getPersistedSnapshot() sets output and error to undefined, which JSON.stringify silently drops. Use JSON.parse(JSON.stringify(snapshot)) before persisting to convert undefined to absent keys.
Test onDone completion for every parallel phase. Write explicit tests that advance all regions to their final states and verify the onDone transition fires. Write tests where some regions fail and verify the guarded onDone routes correctly. Do not trust that this works from documentation alone.

8. Open Questions

History state replacement: the current blocked -> hist pattern resumes at the pre-blocked state. Removing history states means hard-coding the resume target (e.g., blocked -> do-work). Is this acceptable, or does the operator need a more sophisticated "resume where you left off" mechanism?
Context isolation: in the proposed schema, all tasks in a parallel phase share one context. The operator's circuit breaker uses per-task retries and maxRetries. Should these be namespaced in shared context (context.task_1.retries) or should tasks be invoked actors with their own contexts? Invoked actors solve the isolation problem but make task substates opaque to the parent.
Event generation: who constructs the namespaced event string? The operator agent currently sends bare DONE/FAIL events. With namespaced events, the agent must know the task's namespace. Should the state-ledger CLI accept --task task_1 --event DONE and internally construct TASK_1.DONE, preserving the current DX?
Schema migration: when workflow.json version bumps, what happens to in-flight workflows? The simplest answer is "invalidate and restart." Is there a case where partial progress must be preserved across schema changes?
Consolidation task dependency: the proposed schema gates phase2 (consolidation) on phase1 (all subtasks) via sequential ordering. But the operator's research workflow sometimes needs a consolidation task that reads the outputs of all subtasks. How should subtask outputs flow into the consolidation task's input? Context accumulation in shared results object? Explicit input mapping in the schema?
onDone guard reliability: the guarded onDone pattern ([{ target: "phase2", guard: "noErrors" }, { target: "tripped" }]) relies on context being updated before the guard evaluates. Is this guaranteed by XState's event processing semantics, or is there a timing risk where assign actions in child regions haven't flushed before the parent's onDone guard runs?

Sources

Official Documentation

Stately Docs: Parallel States -- Parallel state semantics, event routing, onDone join pattern
Stately Docs: Persistence -- getPersistedSnapshot(), createActor restoration
Stately Docs: States -- StateValue representation, matches(), hasTag() APIs
Stately Docs: Parent States -- Compound states, event bubbling, initial, onDone
Stately Docs: Invoke -- Invoke API, onDone/onError, parent-child communication
Stately Docs: Actors -- Actor types, invoke vs spawn, capabilities matrix
Stately Docs: Setup -- setup() for named actors with type safety
Stately Docs: Final States -- type: "final" semantics
Stately Docs: Guards -- Guard semantics, stateIn() guard

Blog Posts and Analysis

Stately Blog: Persisting and Restoring State -- Full persistence lifecycle
Baptiste Devessier: Parallel States and Events -- Event broadcasting across parallel regions
Tim Deschryver: Building Incremental Views -- Parallel states for progressive data loading
DEV Community: Improve child to parent communication with XState 5 -- sendTo + parentRef pattern
Sandro Maglione: State machines and Actors in XState v5 -- Actor model architecture, root-level invoke
This Dot Labs: Using XState Actors to Model Async Workflows Safely -- Async workflow patterns
DEV Community: Nested and Parallel States Using Statecharts -- Nesting patterns introduction

Deep Analysis

DeepWiki: State Snapshots and Context -- MachineSnapshot fields, StateValue type
DeepWiki: Persistence and Rehydration -- Persisted snapshot structure, restoration pipeline
Statecharts.dev: Parallel State Glossary -- Formalism reference

XState Issues and Discussions (Gotchas)

Issue #5178: Restoring state breaks history behaviour -- History state serialization bug
Issue #4383: stateIn guard ID resolution -- stateIn guard broken for parallel state IDs
Issue #3170: History states break parallel onDone -- History node counted as unfinished region
Issue #2349: Nested parallel states don't bubble done events -- SCXML deviation, fixed in PR #4358
Issue #1341: onDone not triggered -- Various onDone failures
Issue #518: Cross-region transitions -- Sibling region targeting fails silently
Issue #452: Visualizer crashes on nested parallel/compound -- Tooling fragility
Issue #2048: Inspector chokes at ~100KB context -- Serialization performance
Discussion #4828: Schema evolution for persisted snapshots -- No migration story
Discussion #4456: raise in parallel state transitions -- Cross-region raise unreliable
Discussion #1829: Transition type defaults vs SCXML -- Internal vs external transition semantics
Discussion #4716: getPersistedSnapshot null vs undefined -- Serialization wart
Discussion #4697: TypeScript and extracting compound states -- TS type inference limitations
Discussion #2181: Parallel state machine patterns -- Community discussion
Microsoft/TypeScript #39826: XState types cause slow type checking -- Deep parameterized State types

Raw

subtask-1-parallel-state-snapshots.md

topic	How do XState parallel states work and what do their snapshots look like?
parent	[[xstate-nested-parallel-patterns/research]]
date	2026-04-12
type	research-subtask
status	complete

The Shape of Simultaneity

Here is the claim up front: [[XState]]'s parallel states are the single most underappreciated primitive in the library. Everyone reaches for them when they have "two things happening at once," but almost nobody understands what they actually produce at runtime. The state value is not a flat bag of flags. It is a nested object that mirrors the machine's hierarchical structure -- and that structure is the entire point.

Parallel State Semantics: What "All Children Active" Actually Means

A [[parallel state]] node (type: "parallel") activates every direct child region simultaneously upon entry. No initial property is allowed -- the concept does not apply. Every region starts, every region processes events, and the parent does not advance until every region reaches a final state.

This is not sugar for "run N independent machines." It is a single machine with a single snapshot that happens to have N simultaneously active branches. The distinction matters enormously for [[persistence]] and [[event routing]]:

Events broadcast to all regions. When you send({ type: 'CANCEL' }) to a parallel state, every region receives CANCEL and transitions independently. There is no event-targeting mechanism. This is a feature, not a bug -- it means parallel regions share a communication bus by default.
Context is shared. All parallel regions read from and write to the same context object. There is no region-local state beyond the active state node itself.
onDone is a join. The parallel state's onDone transition fires if and only if every child region has reached a type: "final" state. This is [[fork-join concurrency]] modeled declaratively.

The coffee machine example from the official docs makes this concrete:

const coffeeMachine = createMachine({
  initial: 'preparation',
  states: {
    preparation: {
      type: 'parallel',
      states: {
        beans: {
          initial: 'grinding',
          states: {
            grinding: { on: { grindingComplete: 'ground' } },
            ground: { type: 'final' },
          },
        },
        water: {
          initial: 'heating',
          states: {
            heating: {
              always: { guard: 'waterBoiling', target: 'heated' },
            },
            heated: { type: 'final' },
          },
        },
      },
      onDone: 'brewing',  // fires only when BOTH beans AND water are final
    },
    brewing: {},
  },
});

The onDone: 'brewing' transition is the join barrier. Neither beans.ground alone nor water.heated alone triggers it. Both must be final. This is the pattern that maps directly to [[workflow orchestration]] -- parallel task groups with a convergence gate.

Compound State Values: The Object You Actually Get

Here is where most people's mental model breaks. The snapshot.value for a parallel machine is not a string. It is a recursively nested object whose keys mirror the machine hierarchy. I verified this empirically against XState 5.30.0:

Initial state (parallel region embedded in a sequential parent):

{
  "running": {
    "trackA": "task1",
    "trackB": "task3"
  }
}

After advancing one track:

{
  "running": {
    "trackA": "task2",
    "trackB": "task3"
  }
}

After all regions complete (onDone fires, machine exits the parallel state):

"allDone"

The pattern is clear: the object nests as deep as the machine hierarchy goes, and leaf values are always strings. A root-level parallel machine produces { regionA: "leaf", regionB: "leaf" }. A parallel region nested inside a compound state produces { parentState: { regionA: "leaf", regionB: "leaf" } }. This is fully recursive.

This is the key insight for the [[operator]] migration: the current flat taskStates ledger maps trivially onto this structure. Each task's status string becomes a leaf in the parallel state value object.

Reading Active Leaf States from a Snapshot

XState v5 gives you three ways to interrogate a snapshot, and one of them is clearly superior:

snapshot.value -- Raw state value object. You have to walk the nested structure yourself. Useful for serialization, terrible for conditional logic.

snapshot.matches(partialValue) -- Partial matching with subset semantics. snapshot.matches({ running: { trackA: 'task1' } }) returns true even if trackB is in a different state. This is the workhorse for guards and UI rendering. Verified:

snap.matches('running')                                          // true
snap.matches({ running: { trackA: 'task1' } })                  // true
snap.matches({ running: { trackA: 'task1', trackB: 'task3' } }) // true

snapshot.hasTag(tag) -- Tag-based queries. You annotate state nodes with tags: ['loading'] and then check snapshot.hasTag('loading'). This is the most resilient approach because it decouples your query from the machine's structural hierarchy. If you refactor the nesting, tags still work. The XState docs explicitly recommend this over matches().

Bold claim: for the operator's use case, hasTag is the right primitive. Tag each task state with its semantic status ('pending', 'running', 'complete', 'failed') and query by tag. The structural nesting becomes an implementation detail that the consuming code never needs to know about.

`getPersistedSnapshot()`: What Gets Serialized

The persisted snapshot is a plain JSON-serializable object. I verified the exact shape:

{
  "status": "active",
  "value": {
    "running": {
      "trackA": "task2",
      "trackB": "task3"
    }
  },
  "historyValue": {},
  "context": {
    "taskResults": {}
  },
  "children": {}
}

Five fields. That is it:

Field	Type	Purpose
`status`	`'active' \| 'done' \| 'error' \| 'stopped'`	Actor lifecycle
`value`	`StateValue`	The compound state value object (nested for parallel)
`historyValue`	`Record<string, ...>`	Serialized [[history state]] references (node IDs, not instances)
`context`	`TContext`	The full context, with `ActorRef`s replaced by `{ xstate$$type: '$$xstate.actor', id: '...' }` markers
`children`	`Record<string, { snapshot, src, systemId, syncSnapshot }>`	Recursive child actor snapshots

What is NOT persisted (because it is derivable from the machine definition):

_nodes (active StateNode instances)
tags (the active tag set)
machine (the StateMachine reference itself)
All methods: can(), hasTag(), matches(), getMeta(), toJSON()

This is an excellent design. The persisted snapshot is a minimal diff against the machine definition. The machine definition is the schema; the persisted snapshot is the data. Reconstruction is a join of the two. This means persisted snapshots are small even for deeply nested parallel machines -- you are only storing leaf state strings and context, not the entire state graph.

For the operator's workflow.json, this is directly actionable: persist the value and context fields, reconstruct everything else from the machine definition at startup.

Restoring from a Persisted Snapshot

Restoration is straightforward:

const persisted = JSON.parse(localStorage.getItem('workflow'));
const actor = createActor(workflowMachine, { snapshot: persisted });
actor.start();

I verified that the restored actor:

Starts in the exact persisted state (not the initial state)
Processes subsequent events correctly
Produces correct state values after transitions

restored snapshot.value: { "running": { "trackA": "task2", "trackB": "task3" } }
after TASK2_DONE on restored: { "running": { "trackA": "complete", "trackB": "task3" } }

The restoration flow internally:

Creates ActorRef instances from persisted child data
Replaces actor reference markers in context with live ActorRef objects
Converts persisted history node IDs back to StateNode references via machine.getStateNodeById()
Reconstructs the active StateNode[] array using resolveStateValue()

Known Gotcha: History State Serialization

There is one documented bug (GitHub issue #5178): restoring from a JSON-serialized snapshot breaks [[history state]] behavior. The historyValue field contains StateNode references at runtime but gets serialized to plain node ID objects. The deserialization path does not always correctly reconstruct these references, causing history nodes to fall back to initial states instead of remembered states.

This matters if the operator ever uses history states within parallel regions. The workaround is to avoid the JSON round-trip and pass the snapshot object directly, or to manually patch historyValue after deserialization. For the operator's current workflow model (which does not use history states), this is a non-issue.

The Undefined Serialization Trap

getPersistedSnapshot() sets output and error to undefined by default. JSON.stringify() silently drops undefined values, which is usually fine -- but frameworks like Next.js's getServerSideProps reject undefined during serialization. The official workaround:

const safe = JSON.parse(JSON.stringify(actor.getPersistedSnapshot()));

The double-serialization converts undefined to absent keys. This is a wart, not a design flaw.

Sources

Stately Docs: Parallel States -- Official documentation for type: "parallel" semantics, event routing, onDone join pattern, and the music player / coffee machine examples
Stately Docs: Persistence -- Official documentation for getPersistedSnapshot(), createActor restoration, and the distinction between snapshot and persisted state
Stately Blog: Persisting and Restoring State -- Blog post covering the full persistence lifecycle with localStorage examples
Stately Docs: States -- Documentation on StateValue representation for atomic, compound, and parallel states, plus matches() and hasTag() APIs
DeepWiki: State Snapshots and Context -- Analysis of MachineSnapshot fields, query methods, and the StateValue type
DeepWiki: Persistence and Rehydration -- Detailed analysis of persisted snapshot structure, context serialization with ActorRef markers, and the restoration pipeline
GitHub Issue #5178: Restoring state breaks history behaviour -- Bug report documenting the history state serialization failure and workaround
GitHub Discussion #4716: getPersistedSnapshot null vs undefined -- Discussion of the undefined serialization gotcha with Next.js
Baptiste Devessier: Parallel States and Events -- Blog post showing event broadcasting across parallel regions and inter-region communication via send()
Tim Deschryver: Building Incremental Views -- Blog post demonstrating parallel states for progressive data loading with tags-based UI rendering

Raw

subtask-2.md

topic	How do XState nested (compound) states and submachines work?
parent	[[xstate-nested-parallel-patterns/research]]
date	2026-04-12
type	research-subtask
status	complete

Nested States and Submachines: Two Fundamentally Different Composition Models

The single most important thing to understand about [[XState]] v5 is that it gives you two completely distinct ways to nest behavior -- compound states and invoked actors -- and choosing wrong will cost you. They look superficially similar (both put state machines inside state machines) but they have radically different semantics around lifecycle, communication, and state visibility. Conflating them is the #1 architectural mistake in XState codebases.

Compound States: Hierarchy Without Boundaries

A [[compound state]] (also called a "parent state") is just a state node that contains child state nodes. There is no boundary. The child states are part of the parent machine -- they share its context, its event bus, its lifecycle. When you enter a compound state, you automatically enter its initial child. When the deepest active child can't handle an event, it bubbles up to the parent, then the grandparent, and so on until something handles it or nothing does.

const machine = createMachine({
  initial: 'preparation',
  states: {
    preparation: {
      initial: 'weighing',
      states: {
        weighing: {
          on: { weighed: 'grinding' },
        },
        grinding: {
          on: { ground: 'ready' },
        },
        ready: { type: 'final' },
      },
      onDone: { target: 'brewing' },
    },
    brewing: {},
  },
});

This is the right tool when the nested states are logically part of the same process. The preparation phase has substeps, but they aren't independent entities -- they're just a more detailed view of what "preparing" means. The compound state is a zoom lens, not a boundary.

The state value for compound states is a nested object that mirrors the hierarchy:

// When in weighing: { preparation: 'weighing' }
// When in grinding: { preparation: 'grinding' }
// When in brewing: 'brewing'

For deeply nested hierarchies, this nests further: { auth: { login: 'form' } }. The matches() method does partial matching -- snapshot.matches('preparation') returns true regardless of which child is active. This is powerful and underappreciated: it means parent states function as abstract states you can query without knowing the details.

The critical design constraint: compound states MUST have an initial property. Every parent state picks a default child. This isn't optional -- it's the mechanism by which [[statecharts]] guarantee that you're always in exactly one atomic (leaf) state at any given time.

Event bubbling is the killer feature. Define a CANCEL handler on the parent state and every child inherits it for free. This is the statechart equivalent of [[event delegation]] in the DOM, and it's the main reason compound states exist. If you find yourself duplicating the same transition across sibling states, you need a parent state.

Invoked Actors: Real Boundaries, Real Independence

An [[invoked actor]] is a completely separate entity with its own context, its own event processing, and its own lifecycle tied to the invoking state. When you enter a state with invoke, XState spins up a child actor. When you exit that state, the child actor is stopped. Period.

const childMachine = setup({
  types: {
    context: {} as { parentRef: ParentActor },
    input: {} as { parentRef: ParentActor },
  },
}).createMachine({
  context: ({ input: { parentRef } }) => ({ parentRef }),
  initial: 'working',
  states: {
    working: {
      on: {
        finish: 'done',
      },
    },
    done: { type: 'final' },
  },
});

const parentMachine = setup({
  actors: { child: childMachine },
}).createMachine({
  initial: 'processing',
  states: {
    processing: {
      invoke: {
        src: 'child',
        id: 'myChild',
        input: ({ self }) => ({ parentRef: self }),
        onDone: { target: 'complete' },
        onError: { target: 'failed' },
      },
    },
    complete: {},
    failed: {},
  },
});

The invoke config takes src (the actor logic), id (for addressing), input (data passed at spawn time), onDone (transition when child reaches a final state), and onError (transition on failure). The onDone mechanism is how child completion bubbles up -- but unlike compound state onDone, the child's internal states are completely opaque to the parent. The parent only sees "done" or "error."

This is the right tool when:

The child has independent lifecycle needs (it might outlive a state transition)
The child has its own context that shouldn't pollute the parent
You want a clear API boundary between parent and child
The child machine is reusable across different parents

Parent-Child Communication: The v5 Revolution

Here's where v5 made a bold and correct call. In v4, sendParent() was the standard way for children to talk to parents. v5 deprecated it in favor of explicit parent references passed as input. This is philosophically significant: it means child machines are no longer implicitly coupled to having a parent. They receive a reference to whoever created them, and they communicate through that reference.

Child to parent (the modern way):

import { ActorRef, Snapshot, sendTo } from 'xstate';

type ParentEvent = { type: 'childUpdate'; data: string };
type ParentActor = ActorRef<Snapshot<unknown>, ParentEvent>;

const childMachine = setup({
  types: {
    context: {} as { parentRef: ParentActor },
    input: {} as { parentRef: ParentActor },
  },
}).createMachine({
  context: ({ input }) => ({ parentRef: input.parentRef }),
  on: {
    someInternalEvent: {
      actions: sendTo(
        ({ context }) => context.parentRef,
        { type: 'childUpdate', data: 'hello from child' }
      ),
    },
  },
});

Parent to child:

const parentMachine = createMachine({
  invoke: {
    id: 'myChild',
    src: childMachine,
    input: ({ self }) => ({ parentRef: self }),
  },
  on: {
    pokeChild: {
      actions: sendTo('myChild', { type: 'someInternalEvent' }),
    },
  },
});

The sendTo action replaces both sendParent and the old respond action. It's a single primitive for all actor-to-actor communication. The key insight: the parent's self reference is just data. You pass it as input, the child stores it in context, and sendTo sends to that reference. No magic. No implicit coupling. And critically, TypeScript can actually type this correctly -- the ActorRef<Snapshot<unknown>, ParentEvent> type constrains exactly which events the child can send upward.

Inline vs. Referenced Machines: setup() Wins

XState v5 strongly favors the setup() pattern for declaring actors:

// Referenced (preferred)
const parent = setup({
  actors: { child: childMachine },
}).createMachine({
  invoke: { src: 'child', /* ... */ },
});

// Inline (works but loses type safety)
const parent = createMachine({
  invoke: { src: childMachine, /* ... */ },
});

The setup() approach registers actors by name, which gives you: (1) type safety -- named sources are guaranteed to exist at runtime, (2) centralized configuration -- all dependencies declared upfront, (3) testability -- you can override named actors in tests. The inline approach works but it's a code smell in any non-trivial machine. There's a known TypeScript limitation (acknowledged by David Khourshid himself) where extracting compound state definitions into separate variables breaks type inference. The workaround is ugly casts like src: Services.MY_SERVICE as Services.MY_SERVICE. This is one area where the tooling hasn't caught up to the model.

The Decision Framework: When to Use Which

Here's the claim: compound states are for decomposing a single behavior; invoked actors are for composing multiple behaviors.

If you're modeling a login flow with substeps (entering credentials, validating, redirecting), that's a compound state. The substeps aren't independent -- they share context, they share the "cancel" transition, they're one thing viewed at different zoom levels.

If you're modeling a dashboard that launches an auth flow, a data-fetching flow, and a notification listener, those are invoked actors. They have independent lifecycles, independent contexts, and communicate through explicit message passing.

The litmus test: can the child make sense without the parent? If yes, invoke it. If no, nest it.

A subtler litmus test: do you need to observe the child's internal states from the parent? If yes, use compound states (the parent can matches() against any descendant). If no -- if the parent only cares about "done" or "error" -- use invoked actors. The [[actor model]] principle of encapsulation says you shouldn't peek inside actors, and XState enforces this: snapshot.children.myChild.getSnapshot() exists but using it is a smell.

Multiple Invocations and Root-Level Invoke

You can invoke multiple actors in the same state:

invoke: [
  { id: 'logger', src: 'logActor' },
  { id: 'fetcher', src: 'fetchActor' },
]

And you can invoke at the machine root level, which means the actor lives for the entire machine lifetime -- useful for long-running services like [[WebSocket]] connections or event listeners.

Root-level invoke is an underused pattern. It's the XState equivalent of a background service: always running, always listening, never gated by state transitions.

State Value Representation for Deep Hierarchies

For deeply nested compound states, snapshot.value produces nested objects:

// Three levels deep:
{ grandparent: { parent: 'child' } }

// With parallel regions mixed in:
{ mode: 'light', status: { loading: 'fetching' } }

The matches() method supports partial matching at any depth: snapshot.matches({ grandparent: { parent: 'child' } }) checks the exact leaf, while snapshot.matches('grandparent') checks just the top level. This is how you write guards and UI conditionals that don't break when you refactor the hierarchy.

For invoked actors, the parent's snapshot.value only shows the state that contains the invoke -- the child's internal states are invisible. This is by design. The child's states live in snapshot.children.childId.getSnapshot().value, which is a deliberate extra step to discourage tight coupling.

Sources

Parent states - Stately docs -- Comprehensive reference on compound/parent states, event bubbling, initial states, onDone transitions, and design best practices
Invoke - Stately docs -- Complete invoke API reference including src, input, onDone/onError, multiple invocations, root-level invoke, and parent-child communication patterns
Actors - Stately docs -- Actor types comparison (machine, promise, transition, observable, callback), capabilities matrix, invoke vs spawn distinction
Setup - Stately docs -- The setup() function for declaring named actors with type safety, migration from v4 createMachine pattern
Improve child to parent communication with XState 5 - DEV Community -- Practical walkthrough of the sendTo + parentRef pattern replacing sendParent, with TypeScript typing examples
TypeScript and Extracting Compound States in XState v5 - GitHub Discussion #4697 -- Known TypeScript limitations when extracting compound states into variables, workarounds with as const
State Snapshots and Context - DeepWiki -- State value representation for nested/parallel machines, matches() partial matching semantics
State machines and Actors in XState v5 - Sandro Maglione -- Architectural overview of the actor model in v5, root-level invoke patterns, async operation handling

Raw

subtask-3.md

title	Combining Parallel and Sequential Regions in a Single XState Machine
date	2026-04-12
parent	xstate-nested-parallel-patterns
subtask	3
status	done

The Real Primitive: Parallel States Are Just `Promise.all` With Structure

Here is the claim up front: the entire problem of "parallel then sequential then parallel" is already solved by XState's core primitives, and most people who struggle with it are fighting the abstraction instead of reading the statechart spec.

The pattern is embarrassingly simple once you see it. A parallel state with onDone is semantically identical to Promise.all -- all regions must reach their final states before the parent transitions. Chain two parallel states with a sequential state in between and you have a workflow pipeline. There is no special API for this. There is no plugin. It is just nesting.

The reason this trips people up is that they come from imperative workflow engines where "run these in parallel, wait, run the next thing" requires explicit barrier primitives. In statecharts, the barrier IS the state boundary. The onDone transition on a parallel state is the barrier. That's it.

The Canonical Pattern: Parallel -> Sequential -> Parallel

Here is the exact machine for "tasks 1+2 in parallel, then task 3, then tasks 4+5 in parallel":

import { setup, fromPromise, assign } from 'xstate';

const workflowMachine = setup({
  actors: {
    task1: fromPromise(async () => { /* ... */ }),
    task2: fromPromise(async () => { /* ... */ }),
    task3: fromPromise(async () => { /* ... */ }),
    task4: fromPromise(async () => { /* ... */ }),
    task5: fromPromise(async () => { /* ... */ }),
  },
}).createMachine({
  id: 'workflow',
  initial: 'phase1',
  context: { results: {} },
  states: {
    // PHASE 1: Tasks 1 and 2 run concurrently
    phase1: {
      type: 'parallel',
      states: {
        track1: {
          initial: 'running',
          states: {
            running: {
              invoke: {
                src: 'task1',
                onDone: {
                  target: 'done',
                  actions: assign({
                    results: ({ context, event }) => ({
                      ...context.results,
                      task1: event.output,
                    }),
                  }),
                },
              },
            },
            done: { type: 'final' },
          },
        },
        track2: {
          initial: 'running',
          states: {
            running: {
              invoke: {
                src: 'task2',
                onDone: {
                  target: 'done',
                  actions: assign({
                    results: ({ context, event }) => ({
                      ...context.results,
                      task2: event.output,
                    }),
                  }),
                },
              },
            },
            done: { type: 'final' },
          },
        },
      },
      // THE BARRIER: only fires when BOTH tracks hit 'done' (final)
      onDone: 'phase2',
    },

    // PHASE 2: Task 3 runs alone, sequentially
    phase2: {
      invoke: {
        src: 'task3',
        onDone: {
          target: 'phase3',
          actions: assign({
            results: ({ context, event }) => ({
              ...context.results,
              task3: event.output,
            }),
          }),
        },
      },
    },

    // PHASE 3: Tasks 4 and 5 run concurrently
    phase3: {
      type: 'parallel',
      states: {
        track4: {
          initial: 'running',
          states: {
            running: {
              invoke: {
                src: 'task4',
                onDone: {
                  target: 'done',
                  actions: assign({
                    results: ({ context, event }) => ({
                      ...context.results,
                      task4: event.output,
                    }),
                  }),
                },
              },
            },
            done: { type: 'final' },
          },
        },
        track5: {
          initial: 'running',
          states: {
            running: {
              invoke: {
                src: 'task5',
                onDone: {
                  target: 'done',
                  actions: assign({
                    results: ({ context, event }) => ({
                      ...context.results,
                      task5: event.output,
                    }),
                  }),
                },
              },
            },
            done: { type: 'final' },
          },
        },
      },
      onDone: 'complete',
    },

    complete: {
      type: 'final',
    },
  },
});

Read the shape of this machine. It is three sequential phases, and phases 1 and 3 happen to be parallel. The sequential flow is the top-level state structure: phase1 -> phase2 -> phase3 -> complete. The parallelism is nested inside specific phases. Sequential is the skeleton; parallel is the organ.

This is the key insight most tutorials miss: you do not "add sequencing to a parallel machine." You build a sequential machine and embed parallel regions where concurrency is needed.

Why `onDone` Is the Only Barrier You Need

The onDone transition on a parallel state fires when every region inside it has reached a state with type: 'final'. Not "some." Not "the first one." All of them. This is defined by the SCXML specification and XState follows it precisely.

This means:

If task1 finishes in 50ms and task2 finishes in 5 seconds, the machine sits in phase1 for 5 seconds.
If task2 errors and never reaches its final state, onDone never fires. The workflow is stuck. (You need error handling -- more on that below.)
The onDone transition has access to the machine's context, which by this point has been updated by both tracks' assign actions.

There is no race condition here. There is no "what if both finish at the same time" edge case. The statechart formalism handles simultaneous completion correctly by definition. Events are processed atomically. Both regions reaching final triggers a single onDone evaluation.

Compare this to the imperative alternative:

// The Promise.all version -- no structure, no visibility
const [r1, r2] = await Promise.all([task1(), task2()]);
const r3 = await task3();
const [r4, r5] = await Promise.all([task4(), task5()]);

Same logic, but now you have zero visibility into which tasks are running, no way to pause/resume, no way to inspect intermediate state, and no way to handle partial failure without wrapping everything in try/catch spaghetti. The statechart version makes every intermediate state explicit and inspectable.

Nested Parallel-Within-Sequential: The Natural Direction

The pattern above demonstrates the natural nesting direction: parallel regions nested inside sequential states. This is the common case and the one XState is optimized for.

Each phase in a sequential workflow can independently decide whether it needs parallelism:

states: {
  step1: { /* simple sequential state */ },
  step2: { type: 'parallel', states: { /* concurrent tracks */ }, onDone: 'step3' },
  step3: { /* simple sequential state */ },
  step4: { type: 'parallel', states: { /* more concurrent tracks */ }, onDone: 'step5' },
  step5: { type: 'final' },
}

This is clean because the top-level flow reads like a pipeline. You can glance at the state names and understand the workflow order. The parallelism is an implementation detail of specific steps.

The Inverse: Sequential-Within-Parallel

The reverse nesting -- sequential states inside parallel regions -- is equally valid and equally supported. Each region of a parallel state is itself a full state machine. It can have its own initial, its own transitions, its own nested states.

The coffee machine example from the XState docs demonstrates this:

preparing: {
  type: 'parallel',
  states: {
    grindBeans: {
      initial: 'grindingBeans',
      states: {
        grindingBeans: {
          on: { BEANS_GROUND: 'beansGround' },
        },
        beansGround: { type: 'final' },
      },
    },
    boilWater: {
      initial: 'boilingWater',
      states: {
        boilingWater: {
          on: { WATER_BOILED: 'waterBoiled' },
        },
        waterBoiled: { type: 'final' },
      },
    },
  },
  onDone: 'makingCoffee',
},

Each region (grindBeans, boilWater) has its own sequential state machine with initial -> intermediate -> final. The regions run independently. This is the standard pattern for "do N things concurrently, each of which has its own multi-step process."

You can push this further. A region inside a parallel state can itself contain a parallel state:

phase1: {
  type: 'parallel',
  states: {
    trackA: {
      initial: 'subPhase',
      states: {
        subPhase: {
          type: 'parallel',
          states: {
            subTrack1: { /* ... */ },
            subTrack2: { /* ... */ },
          },
          onDone: 'trackADone',
        },
        trackADone: { type: 'final' },
      },
    },
    trackB: { /* ... */ },
  },
  onDone: 'phase2',
},

This is parallel-within-parallel. It works. The onDone bubbling is recursive: subTrack1 and subTrack2 must both finish -> subPhase's onDone fires -> trackA reaches final -> if trackB is also final, phase1's onDone fires. Each level is its own barrier.

But exercise restraint. More than two levels of nesting and you are building something nobody can debug. If your workflow is that complex, consider breaking it into separate machines communicating via the actor model instead.

Guards and Actions Across Parallel Regions

This is where people get confused, so let me be blunt: parallel regions share context, but they do NOT share state.

Shared Context

All regions in a parallel state read from and write to the same context object. When track1 runs assign({ results: ... }), that mutation is visible to track2's guards and actions. This is powerful but dangerous -- if two regions both assign to the same context key, the last one to execute wins. XState processes events atomically, so there is no true race condition, but the order may not be what you expect.

Best practice: give each track its own context key. The example above uses context.results.task1, context.results.task2, etc. No overlap, no surprises.

The `stateIn` Guard

XState provides stateIn() for checking whether the machine is currently in a specific state. This is specifically designed for cross-region coordination in parallel states:

import { stateIn } from 'xstate';

// Inside a parallel region:
on: {
  SOME_EVENT: {
    guard: stateIn({ otherRegion: 'someState' }),
    target: 'nextState',
  },
}

This lets one region's transitions depend on another region's current state. The stateIn guard checks the entire machine's state, not just the local region.

Use this sparingly. If two regions need tight coordination via stateIn, they probably should not be parallel in the first place. The whole point of parallel states is independence. stateIn is an escape hatch for partially-independent behaviors, not a general-purpose coordination mechanism.

Cross-Region Event Broadcasting

Every event sent to a parallel state is received by all regions simultaneously. This is the primary coordination mechanism. If region A needs to tell region B something, it does not send a targeted message -- it sends an event that all regions receive, and region B has a transition for it while region A ignores it.

phase1: {
  type: 'parallel',
  states: {
    producer: {
      initial: 'working',
      states: {
        working: {
          on: {
            TASK_COMPLETE: 'done',  // producer also transitions
          },
        },
        done: { type: 'final' },
      },
    },
    consumer: {
      initial: 'waiting',
      states: {
        waiting: {
          on: {
            TASK_COMPLETE: 'processing',  // consumer reacts to same event
          },
        },
        processing: {
          on: { DONE: 'finished' },
        },
        finished: { type: 'final' },
      },
    },
  },
  onDone: 'nextPhase',
},

Both regions handle TASK_COMPLETE. The producer transitions to its final state; the consumer starts processing. This is clean, explicit, and visible in the statechart visualization.

Error Handling in Parallel Phases

The elephant in the room: what happens when one track in a parallel phase fails?

If you are invoking actors (promises, callbacks, machines), each invocation supports onError:

track1: {
  initial: 'running',
  states: {
    running: {
      invoke: {
        src: 'task1',
        onDone: {
          target: 'done',
          actions: assign({ /* store result */ }),
        },
        onError: {
          target: 'failed',
          actions: assign({
            errors: ({ context, event }) => [
              ...context.errors,
              { task: 'task1', error: event.error },
            ],
          }),
        },
      },
    },
    done: { type: 'final' },
    failed: { type: 'final' },  // NOTE: also final!
  },
},

The critical decision: is failed a final state?

If failed has type: 'final': the track is "done" (albeit with an error), and onDone for the parallel parent will fire once all tracks reach either done or failed. You handle errors downstream.
If failed is NOT final: the parallel state's onDone will never fire for this track. The workflow is stuck until something transitions failed elsewhere (e.g., a RETRY event).

Neither approach is wrong. The choice depends on your workflow semantics. But you must make the choice explicitly. Leaving failed as a non-final state without a recovery path is the single most common bug in parallel XState workflows.

Real-World Pattern: CI/CD Pipeline

Here is a more realistic example -- a CI/CD pipeline where linting and testing run in parallel, then deployment runs sequentially, then smoke tests and notification run in parallel:

const ciPipeline = setup({
  actors: {
    lint: fromPromise(async () => { /* run eslint */ }),
    test: fromPromise(async () => { /* run test suite */ }),
    deploy: fromPromise(async ({ input }) => { /* deploy to input.env */ }),
    smokeTest: fromPromise(async () => { /* hit health endpoints */ }),
    notify: fromPromise(async ({ input }) => { /* post to Discord */ }),
  },
}).createMachine({
  id: 'ci',
  initial: 'validate',
  context: {
    commitSha: '',
    results: {},
    errors: [],
  },
  states: {
    // Phase 1: lint + test in parallel
    validate: {
      type: 'parallel',
      states: {
        linting: {
          initial: 'running',
          states: {
            running: {
              invoke: {
                src: 'lint',
                onDone: { target: 'passed' },
                onError: {
                  target: 'failed',
                  actions: assign({
                    errors: ({ context, event }) => [
                      ...context.errors,
                      { task: 'lint', error: event.error },
                    ],
                  }),
                },
              },
            },
            passed: { type: 'final' },
            failed: { type: 'final' },
          },
        },
        testing: {
          initial: 'running',
          states: {
            running: {
              invoke: {
                src: 'test',
                onDone: { target: 'passed' },
                onError: {
                  target: 'failed',
                  actions: assign({
                    errors: ({ context, event }) => [
                      ...context.errors,
                      { task: 'test', error: event.error },
                    ],
                  }),
                },
              },
            },
            passed: { type: 'final' },
            failed: { type: 'final' },
          },
        },
      },
      onDone: [
        {
          // Guard: only deploy if both passed
          guard: ({ context }) =>
            !context.errors.length,
          target: 'deploying',
        },
        { target: 'failed' },  // default: go to failed
      ],
    },

    // Phase 2: deploy (sequential)
    deploying: {
      invoke: {
        src: 'deploy',
        input: { env: 'production' },
        onDone: 'postDeploy',
        onError: 'failed',
      },
    },

    // Phase 3: smoke test + notification in parallel
    postDeploy: {
      type: 'parallel',
      states: {
        smoking: {
          initial: 'running',
          states: {
            running: {
              invoke: {
                src: 'smokeTest',
                onDone: { target: 'done' },
                onError: { target: 'done' }, // smoke test failure is non-fatal
              },
            },
            done: { type: 'final' },
          },
        },
        notifying: {
          initial: 'running',
          states: {
            running: {
              invoke: {
                src: 'notify',
                input: { message: 'Deployed!' },
                onDone: { target: 'done' },
                onError: { target: 'done' }, // notification failure is non-fatal
              },
            },
            done: { type: 'final' },
          },
        },
      },
      onDone: 'success',
    },

    success: { type: 'final' },
    failed: { type: 'final' },
  },
});

This is a real workflow. Notice the guarded onDone transition on the validate phase -- it checks context for errors before proceeding to deploy. If validation failed, the workflow short-circuits to failed. This is the pattern for conditional progression between phases.

The Rules, Summarized

Sequential is the skeleton. Your top-level states define the pipeline order. Parallelism is nested inside specific steps.
onDone is the barrier. A parallel state's onDone fires when all regions reach final. This is your Promise.all. No other mechanism needed.
Each region is a full machine. Regions can have their own sequential states, their own nested parallel states, their own invocations. The nesting is recursive.
Context is shared; state is not. All regions read/write the same context. Use distinct keys per track. Use stateIn() sparingly for cross-region state checks.
Events broadcast to all regions. This is the intended coordination primitive. Not stateIn, not shared context mutations. Events.
Decide your failure semantics. Is a failed track "done" (final) or "stuck" (non-final)? This single decision determines whether your workflow can proceed or deadlocks.
Guard your phase transitions. Use guarded onDone transitions to check context (error flags, result validity) before moving to the next phase.

Sources

Raw

subtask-4.md

title

Limitations and Gotchas of XState Parallel/Nested Machines

date

2026-04-12

parent

xstate-nested-parallel-patterns

task

status

complete

sources

https://stately.ai/docs/parallel-states

statelyai/xstate#4383

statelyai/xstate#3170

statelyai/xstate#2349

statelyai/xstate#1341

statelyai/xstate#518

statelyai/xstate#5178

statelyai/xstate#2048

statelyai/xstate#1829

statelyai/xstate#4828

statelyai/xstate#4456

microsoft/TypeScript#39826

statelyai/xstate#452

The Sharp Edges of XState Parallel and Nested Machines

The statechart formalism is elegant. XState's implementation of it is good. But parallel and deeply nested machines are where the abstraction starts leaking, and the leaks are not theoretical -- they are the kind that silently corrupt your workflow state at 3am when a persisted snapshot restores wrong. This document catalogs the real pain points, not the toy-example gotchas.

1. Event Routing: The Broadcasting Problem Nobody Thinks About

Claim: events broadcast to all parallel regions simultaneously, and this is both the main feature and the main footgun.

When a parallel state receives an event, every active region gets it. All of them. Simultaneously. This is by design -- it is the statechart formalism working as intended. But developers consistently misunderstand what this means in practice.

If you have regions A, B, and C, and you send TASK_FAILED, every region that has a transition for TASK_FAILED will take it. There is no "routing" in the HTTP-sense. There is no "this event is for region B." The event is a broadcast.

This creates a coordination problem that the formalism hides behind elegant notation. If two regions both handle the same event type but should handle it differently depending on which region "owns" it, you must disambiguate with guards or distinct event types. Neither option scales cleanly:

Guards: you end up with stateIn checks that reference sibling regions, coupling the regions together and defeating the purpose of parallelism.
Distinct events: you get TRACK_A.TASK_FAILED vs TRACK_B.TASK_FAILED, which is just manual routing with extra steps.

The honest answer is that parallel regions only work well when the event vocabularies of each region are disjoint. The moment two regions need to respond differently to the same stimulus, the model is fighting you.

2. `onDone` Is a Minefield

Claim: the onDone transition on parallel states has been the single most bug-reported feature in XState's history, and several of those bugs shipped for years.

The rule is simple: onDone fires when ALL regions reach a final state. In practice:

Bug: `onDone` fired too early (Issue #326)

The parallel state's onDone triggered when the first child hit a final state, not when all of them did. This was a flat-out correctness violation that shipped.

Bug: History states break `onDone` (Issue #3170)

If you add a type: "history" node as a child of a parallel state alongside your regions, onDone silently never fires. The history node gets counted as a region that never reaches a final state. Fixed in PR #3171, but the failure mode was silent -- no error, no warning, just a machine that gets stuck.

Bug: Nested parallel states don't bubble `done` events (Issue #2349)

If you have a parallel state inside another parallel state, the inner one completing does not generate the done.state.* event that the outer one needs. This was acknowledged as a deviation from SCXML, and was only fixed in PR #4358, which implemented recursive done event generation. For years before that fix, nested parallel completion simply did not work.

Bug: `onDone` not triggered at all (Issue #1111)

Multiple reports of onDone just not firing on parallel states, for various structural reasons.

The pattern here is damning. The onDone + parallel state combination has been broken, re-broken, and partially-fixed across at least four major issues. If your architecture depends on parallel onDone for correctness, you are standing on ground that has historically been quicksand. Test it aggressively. Pin your XState version. Do not assume an upgrade will not regress this.

3. Cross-Region Transitions: Forbidden Territory

Claim: you cannot reliably transition from one parallel region to a state in a sibling region, and this is a deliberate design constraint, not a bug.

The SCXML spec (Section 3.1.3) technically allows targeting a state in a sibling parallel branch. XState does not reliably support this. Issue #518 documents the behavior: when you target a sibling branch by ID, the exit action fires but the state change does not occur. The visualizer draws the arrow correctly, giving you false confidence.

The official guidance is blunt: "Avoid transitions between regions." Each region must be self-contained. If region A needs to force region B into a specific state, the only sanctioned mechanism is:

Region A sends an event (via raise or sendTo).
Region B handles that event with its own transition.

But even this has a gotcha: raise in a parallel state transition does not always propagate to sibling regions in the same microstep (Discussion #4456). The simulator did not execute raised actions for a period, and even now the ordering semantics are subtle. You may need to use sendTo targeting the actor itself to get reliable delivery.

The bottom line: parallel regions are parallel universes. They share an event bus and nothing else. Any attempt to reach across the boundary will eventually hurt you.

4. Persistence and Snapshots: Where the Abstraction Collapses

Claim: getPersistedSnapshot() for parallel machines works, but the restoration path has correctness holes that will corrupt your state.

History state restoration is broken (Issue #5178)

This is the worst one. When you call getPersistedSnapshot(), serialize it to JSON, and restore it later, the historyValue property loses its StateNode reference and becomes a plain object. History transitions then silently route to the initial state instead of the remembered state.

This means: if you persist a machine that uses history states (which is the natural way to implement "resume where you left off"), and you restore it later, the machine forgets where it was. The failure is silent. No error. Just wrong behavior.

PR #5269 proposes a reviveHistoryValue fix, but as of this research, this is a live concern for anyone combining persistence with history states.

Schema evolution is unaddressed (Discussion #4828)

There is no versioning, migration, or compatibility story for persisted snapshots. If you change your machine definition -- add a state, remove a state, rename a region -- and then try to restore a snapshot from the old definition, behavior is undefined. The discussion explicitly calls out the absence of documentation on:

What developers can depend on across XState versions
How to detect a stale/incompatible snapshot
How to migrate persisted snapshots when the machine schema changes

For a workflow orchestrator that persists state to disk or database, this is not a theoretical concern. It is the primary operational risk.

Serialization size scales quadratically with nesting

XState v5 recursively persists invoked/spawned child actors. For a parallel machine with N regions, each containing an invoked submachine with its own context, the snapshot includes the full state of every child. Context objects are serialized in full. There is no deduplication, no compression, no size limit.

The inspector chokes at roughly 100KB of stringified context (Issue #2048). A parallel machine with 10 regions, each holding moderate context, will easily exceed this. The machine itself will function, but your observability tooling will not.

5. The Inspector and Visualizer Break Under Complexity

Claim: XState's tooling was designed for teaching examples, not production machines. It degrades ungracefully at scale.

Visualizer crashes on nested parallel/compound combinations (Issue #452)

When a compound state has one parallel child and one non-parallel child, the Stately visualizer breaks on "update." This was reported early and the fix was incremental, but it reveals the fragility: the visualizer's layout algorithm does not handle the combinatorial explosion of parallel regions well.

Inspector performance degrades at ~100KB context

800ms+ stringify delays on send() calls. UI freezing when 10+ events fire in quick succession. Crashes with moderately complex context objects. The root cause is aggressive JSON serialization of the entire machine state on every event. Workarounds exist (toJSON overrides, context truncation) but they are manual and brittle.

No practical nesting depth limit, but tooling gives out around 3-4 levels

XState itself imposes no depth limit. You can nest parallel inside sequential inside parallel indefinitely. But:

The visualizer becomes unreadable at 3 levels of nesting.
The inspector serialization cost grows with the product of all region counts.
TypeScript type inference slows dramatically. Issue #39826 on the TypeScript repo itself documents XState types causing slow type checking, traced to variance checks on deeply parameterized State types. This was partially addressed by simplifying XState's type definitions, but complex nested machines still push the TS compiler.

The practical ceiling is about 2-3 levels of nesting before you lose the ability to inspect, visualize, or get fast IDE feedback. Beyond that, you are flying blind.

6. The State Explosion Nobody Models

Claim: parallel states create a combinatorial state space, and XState provides zero tools to constrain it.

Three parallel regions with 3 states each = 27 possible combinations. Some of those combinations are impossible in your domain (e.g., {payment: 'refunded', order: 'pending', shipping: 'delivered'}), but XState has no way to express that constraint. The machine will happily enter impossible states if your event handling allows it.

This is the fundamental tradeoff of parallel states: they eliminate the state explosion of the flat product machine, but they introduce the implicit product space. The states are never enumerated, so they are never validated. You trade explicit impossibility for implicit impossibility.

Guards help. But guards are runtime checks, not compile-time guarantees. XState's type system does not model the cross-product of parallel region states. You cannot write a TypeScript type that says "these region combinations are invalid."

For a workflow orchestrator, this means: you must reason about the cross-product yourself. If task A failing should prevent task B from completing, that constraint lives in your guards, not in your state topology. And if you forget a guard, you get a corrupt workflow state that looks valid to XState.

7. The SCXML Deviation Tax

Claim: XState's deviations from SCXML are reasonable individually but create an unpredictable surface when combined.

XState is "heavily inspired by SCXML" but is explicitly its own DSL. The deviations that affect parallel states:

Transition type defaults: SCXML defaults to external transitions. XState defaults to internal for relative targets. This changes re-entry behavior for parallel regions dramatically (Discussion #1829). A transition that re-enters all sibling regions in SCXML might do nothing in XState, or vice versa, depending on whether you wrote "#target" or ".target".
stateIn guard ID resolution: The stateIn guard did not parse #id references to parallel states correctly (Issue #4383, fixed in PR #4387). If you used stateIn('#b.B1'), it returned false even when the machine was in that state. You had to use stateIn({ B: 'B1' }) or the full root path.
done event bubbling: SCXML only generates done.state.id at specific hierarchy levels. XState extended this to recursively bubble done events from nested parallel states -- a useful deviation, but one that means your machine's completion semantics differ from what SCXML tooling would predict.

None of these deviations are wrong. But they mean you cannot reason about XState parallel states using SCXML documentation as a reference. You must test against XState's actual behavior, which has changed across versions.

Summary: The Honest Assessment

Parallel states in XState are production-ready for shallow, well-separated concerns with disjoint event vocabularies. They work. They persist. They complete.

But they degrade along every axis simultaneously as you push complexity:

Dimension	Safe zone	Danger zone
Nesting depth	1-2 levels	3+ levels
Regions per parallel state	2-5	10+
Event vocabulary overlap	None	Shared event types
History states + persistence	Don't combine	Broken (Issue #5178)
`onDone` reliability	Shallow parallel	Nested parallel (historical bugs)
Tooling (visualizer/inspector)	Small machines	>100KB context, >3 nesting levels
TypeScript inference	Simple machines	Deep nesting with many regions
Snapshot schema evolution	Never change the machine	Any machine change post-deployment

The most dangerous failure mode is silent state corruption: history states that forget, onDone that never fires, cross-region transitions that exit but don't enter. XState does not throw errors for these -- it just does the wrong thing.

For a workflow orchestrator, the recommendation is clear: use parallel states for the top-level "these task groups run concurrently" concern, but do not nest parallel inside parallel. Use invoked actors for the individual tasks within each group. This keeps you in the safe zone on every dimension and gives you explicit actor lifecycle management instead of implicit region coordination.

usirin/research.md

XState Nested and Parallel Patterns for Workflow Orchestration

Executive Summary

1. Parallel State Snapshots and Serialization

2. Compound States vs. Invoked Actors: Two Composition Models

3. Mixing Parallel and Sequential: The Architecture Pattern

4. Limitations and Gotchas

5. Proposed workflow.json Schema

6. State-Ledger API Changes

7. Gotcha Avoidance Checklist

8. Open Questions

Sources

Official Documentation

Blog Posts and Analysis

Deep Analysis

XState Issues and Discussions (Gotchas)

The Shape of Simultaneity

Parallel State Semantics: What "All Children Active" Actually Means

Compound State Values: The Object You Actually Get

Reading Active Leaf States from a Snapshot

getPersistedSnapshot(): What Gets Serialized

Restoring from a Persisted Snapshot

Known Gotcha: History State Serialization

The Undefined Serialization Trap

Sources

Nested States and Submachines: Two Fundamentally Different Composition Models

Compound States: Hierarchy Without Boundaries

Invoked Actors: Real Boundaries, Real Independence

Parent-Child Communication: The v5 Revolution

Inline vs. Referenced Machines: setup() Wins

The Decision Framework: When to Use Which

Multiple Invocations and Root-Level Invoke

State Value Representation for Deep Hierarchies

Sources

The Real Primitive: Parallel States Are Just Promise.all With Structure

The Canonical Pattern: Parallel -> Sequential -> Parallel

Why onDone Is the Only Barrier You Need

Nested Parallel-Within-Sequential: The Natural Direction

The Inverse: Sequential-Within-Parallel

Guards and Actions Across Parallel Regions

Shared Context

The stateIn Guard

Cross-Region Event Broadcasting

Error Handling in Parallel Phases

Real-World Pattern: CI/CD Pipeline

The Rules, Summarized

Sources

The Sharp Edges of XState Parallel and Nested Machines

1. Event Routing: The Broadcasting Problem Nobody Thinks About

2. onDone Is a Minefield

Bug: onDone fired too early (Issue #326)

Bug: History states break onDone (Issue #3170)

Bug: Nested parallel states don't bubble done events (Issue #2349)

Bug: onDone not triggered at all (Issue #1111)

3. Cross-Region Transitions: Forbidden Territory

4. Persistence and Snapshots: Where the Abstraction Collapses

History state restoration is broken (Issue #5178)

Schema evolution is unaddressed (Discussion #4828)

Serialization size scales quadratically with nesting

5. The Inspector and Visualizer Break Under Complexity

Visualizer crashes on nested parallel/compound combinations (Issue #452)

Inspector performance degrades at ~100KB context

No practical nesting depth limit, but tooling gives out around 3-4 levels

6. The State Explosion Nobody Models

7. The SCXML Deviation Tax

Summary: The Honest Assessment

`getPersistedSnapshot()`: What Gets Serialized

The Real Primitive: Parallel States Are Just `Promise.all` With Structure

Why `onDone` Is the Only Barrier You Need

The `stateIn` Guard

2. `onDone` Is a Minefield

Bug: `onDone` fired too early (Issue #326)

Bug: History states break `onDone` (Issue #3170)

Bug: Nested parallel states don't bubble `done` events (Issue #2349)

Bug: `onDone` not triggered at all (Issue #1111)