A unified framework drawing from graph theory, category theory, finite state machines, and compiler theory — applied to the design of agent harnesses, orchestrators, and quality systems.
Today's agent orchestration is predominantly ad hoc: markdown skills, prompt chains, and custom harness code. This creates several failure modes:
- No formal verification — you can't prove a workflow terminates, doesn't deadlock, or satisfies safety properties
- No composability — quality gates, eval patterns, and workflow stages are one-off implementations
- No visual comprehensibility — workflows live in code, not in inspectable graph structures
- No scaling theory — adding more LLM calls for "safety" can actually degrade quality (Chen et al. 2024)
- No separation between orchestration logic and agent logic — the control flow is entangled with the prompts
The CS formalisms below address each of these problems with well-established theory and tooling.
A task decomposition is a directed acyclic graph D = (V, S):
- V: Sub-task nodes
{v_1, ..., v_n} - S: Dependency edges
S ⊆ V × V - Acyclicity: No path
v_i → ... → v_i
From Allegrini et al. (2025):
D := LLM.Build_Task_DAG(I_U, {EEinfo})
The LLM can construct the DAG, but the structure itself is a formal object with provable properties.
1. Topological scheduling. Compute the execution order automatically:
for each node v in topological_sort(D):
if all pred(v) ∈ COMPLETED:
schedule(v) # can run in parallel with other ready nodes
2. Deadlock-freedom by construction. Acyclic graphs cannot produce circular wait conditions. This is a theorem, not a hope.
3. Parallelism detection. Nodes with no edges between them can execute concurrently. The DAG reveals this automatically — no manual parallelism annotations needed.
4. Visual comprehensibility. A DAG is a picture. You can render it, inspect it, trace execution through it. This is the "n8n-style workflow" insight — the graph IS the observability surface.
5. Dependency validation. Before execution, check: are all required inputs available? Are all required capabilities registered? Is the graph well-formed? These are static checks on the graph structure.
The DAG structure enables a powerful hybrid:
- LLM constructs the DAG — decomposes intent into sub-tasks with dependencies
- Code enforces the DAG — the orchestrator follows topological order, enforces dependencies
- Quality gates are DAG nodes — eval checkpoints are first-class nodes with edges to downstream tasks
This is exactly the "mix of both" described in the notes: code that follows a strict FSM gating transitions, with LLMs handling the creative/analytical work within each node.
AgentSeam's Layer 4 (Session) already tracks turns and events. The DAG formalism extends this: a workflow is a DAG of sessions (or turns within a session). Each node is a unit of agent work. Edges are data dependencies. The orchestrator is a separate concern that schedules according to the DAG.
From Allegrini et al. (2025), the task lifecycle is:
L = (S_t, s_0, E_t, δ)
S_t = {CREATED, AWAITING_DEPENDENCY, READY, DISPATCHING,
IN_PROGRESS, COMPLETED, FAILED, RETRY_SCHEDULED,
FALLBACK_SELECTED, CANCELED, ERROR}
s_0 = CREATED
δ: S_t × E_t → S_t (deterministic transition function)
1. Formal verification. Express properties in CTL temporal logic:
- Safety:
AG(state=DISPATCHING → previous_state=READY)— "you can never dispatch without being ready" - Liveness:
AG(state=CREATED → AF(COMPLETED ∨ ERROR ∨ CANCELED))— "every task eventually terminates" - Fairness:
AG(state=AWAITING_DEPENDENCY → AF(state ≠ AWAITING_DEPENDENCY))— "nothing waits forever"
These aren't aspirational — they're checkable by automated model checkers (SPIN, NuSMV, TLA+).
2. Guard conditions. Transitions have preconditions:
- DISPATCHING only from READY (TL₅)
- COMPLETED only from IN_PROGRESS (TL₆)
- Terminal states are absorbing (TL₇, TL₉)
This prevents the "session looks stuck after abort" failure mode from claude-session-platform v1. The FSM makes illegal transitions unrepresentable.
3. Explicit recovery paths. The FAILED → RETRY_SCHEDULED → DISPATCHING cycle and FAILED → FALLBACK_SELECTED → DISPATCHING path are first-class transitions, not error-handling afterthoughts.
4. Observable state. Every task is in exactly one state at any time. Observability is trivial — just read the state. No need to infer "what's happening" from a stream of events.
The key insight from the notes: code controls the FSM, LLMs do the work within states.
[CREATED] --code checks dependencies--> [READY]
[READY] --code dispatches agent--> [DISPATCHING]
[DISPATCHING] --agent runtime--> [IN_PROGRESS]
[IN_PROGRESS] --LLM produces output--> [AWAITING_EVAL]
[AWAITING_EVAL] --eval runs--> [COMPLETED] or [FAILED]
[FAILED] --code checks retry policy--> [RETRY_SCHEDULED]
The transitions between states are deterministic code. The work within each state is LLM-driven. Quality gates are transitions guarded by eval results. This gives you the control of compiled code with the flexibility of LLM agents.
AgentSeam already has a 10-state session model with 5 flags. The Allegrini model suggests extending this with:
- Guard conditions on transitions (formalized, not just convention)
- Temporal logic properties that can be verified
- Recovery paths as first-class transitions (not exception handling)
A category C consists of:
- Objects: Types (input/output schemas of workflow stages)
- Morphisms (arrows): Transformations between types (workflow stages)
- Composition: If
f: A → Bandg: B → C, theng ∘ f: A → C - Identity: For each object
A, there existsid_A: A → A - Associativity:
h ∘ (g ∘ f) = (h ∘ g) ∘ f
1. Principled composability. If you can define the type of each workflow stage (its input and output), composition is automatic. You don't need to know how a stage works internally — just its type signature.
extract: RawData → StructuredData
transform: StructuredData → NormalizedData
evaluate: NormalizedData → QualityReport
// Compose:
pipeline = evaluate ∘ transform ∘ extract
// Type: RawData → QualityReport
2. Functors for domain adaptation. A functor F: C → D maps objects and morphisms from one category to another, preserving composition. This is the answer to "how to have composability across domains":
// Generic quality gate pattern (Category C):
gate: WorkProduct → QualityReport
// Functor F maps to code review domain (Category D):
F(gate): PullRequest → CodeReviewReport
// Functor G maps to content domain (Category E):
G(gate): Article → ContentQualityReport
The gate pattern is defined once. Functors adapt it to specific domains. The composition laws guarantee the adapted version still works correctly.
3. Natural transformations for workflow migration. A natural transformation η: F ⇒ G converts between two functors — i.e., between two domain adaptations of the same pattern. This enables migrating workflows between domains while preserving structure.
4. Monads for sequencing with effects. In functional programming, monads handle sequencing of operations with side effects. In agentic workflows, the "effects" are: LLM calls (non-deterministic), tool execution (side-effecting), quality gates (potentially failing). A monad captures the pattern:
type AgentStep<A> = {
run: (context: Context) → Promise<Result<A>>
}
// Composition via bind/flatMap:
extractStep.flatMap(data =>
transformStep(data).flatMap(normalized =>
evaluateStep(normalized)
)
)
The notes describe the vision: "smaller building blocks that allow you to state certain principles, and then you can mix and match." Category theory formalizes this:
Building blocks are morphisms:
qualityGate: WorkProduct → EvalResultllmJudge: Content → VerdicthumanReview: Verdict → Decisionretry: FailedResult → WorkProduct(with retry policy)
Composition gives you workflows:
fullReview = humanReview ∘ llmJudge ∘ qualityGate
autoReview = retry ∘ llmJudge ∘ qualityGate
Functors give you domain adaptation:
- Same
qualityGatepattern, adapted to code review, content review, data validation - The functor specifies the domain-specific criteria
- The composition structure is preserved
AgentSeam's event bus and enrichment consumer pattern (CAL) is a natural transformation: it watches one category of events (raw session events) and produces another (semantic annotations). The bus itself is a functor from the "session event" category to the "enrichment event" category.
The Layer 3 normalizer is already a functor: it maps from the "Claude SDK message" category to the "AgentMessage" category, preserving composition (message sequences map to message sequences).
A compiler transforms source code through a series of intermediate representations:
Source Code → [Frontend] → AST → [Middle-end] → IR → [Backend] → Machine Code
↓
Pass 1: Type checking
Pass 2: Optimization
Pass 3: Dead code elimination
Pass N: ...
An agent pipeline transforms user intent through a series of intermediate representations:
User Intent → [Decomposition] → Task DAG → [Execution] → Results → [Synthesis] → Output
↓
Pass 1: Dependency resolution
Pass 2: Capability matching
Pass 3: Quality gate evaluation
Pass N: ...
1. Intermediate Representations (IRs). Each stage of the pipeline operates on a well-defined IR. The IR is the contract between stages. As long as the IR is respected, stages can be swapped independently.
For agent pipelines:
- Task DAG is the IR between decomposition and execution
- AgentMessage stream is the IR between runtime and observation (AgentSeam Layer 3)
- SessionEvent log is the IR between execution and attention derivation (AgentSeam Layer 4→5)
- Eval report is the IR between quality gates and retry logic
2. Multi-pass optimization. Compilers run multiple passes over the same IR, each improving it. Agent pipelines can do the same:
Pass 1: Static analysis — check types, schemas, dependencies
Pass 2: Cost estimation — estimate token cost per sub-task
Pass 3: Parallelism detection — identify independent sub-tasks
Pass 4: Quality gate insertion — add eval nodes at critical junctions
Pass 5: Resource allocation — assign models/providers to sub-tasks
These passes transform the Task DAG before execution begins. Each pass is independent and composable.
3. Compilation as verification. LangGraph already uses this pattern: after building a StateGraph with nodes and edges, you call .compile() which performs structural checks (no orphaned nodes, valid edge targets). This is the compiler frontend verifying syntax before generating code.
Extended to formal verification:
compile(graph) → {
check: no orphaned nodes
check: all edges target valid nodes
check: no cycles (DAG property)
check: all required inputs satisfied
check: safety properties hold (CTL model checking)
check: liveness properties hold (termination guaranteed)
optimize: detect parallelizable stages
optimize: compute optimal LLM call counts per node (Chen et al.)
}
4. Abstract interpretation. Compilers use abstract interpretation to reason about program behavior without executing it. For agent pipelines: simulate the workflow with abstract "types" instead of actual data to detect type mismatches, missing dependencies, or unreachable states before running expensive LLM calls.
5. SSA form and data flow. Static Single Assignment form tracks where each value is defined and used. For agent pipelines: track where each artifact is produced and consumed. Detect unused outputs, missing inputs, and redundant computations.
Instead of imperative harness code, define workflows declaratively and compile them:
const workflow = defineWorkflow({
nodes: {
research: { agent: "researcher", input: TaskSpec, output: ResearchReport },
evaluate: { eval: "quality-gate", input: ResearchReport, output: EvalResult },
write: { agent: "writer", input: ResearchReport, output: Article },
review: { eval: "content-review", input: Article, output: ReviewResult },
},
edges: {
research -> evaluate,
evaluate[pass] -> write,
evaluate[fail] -> research, // retry
write -> review,
review[pass] -> output,
review[fail] -> write, // revise
},
constraints: {
maxRetries: { research: 3, write: 2 },
timeout: { research: "5m", write: "10m" },
}
})
const compiled = compile(workflow)
// Verified: no deadlocks, all paths terminate, type safety holds
// Optimized: research and write can't run in parallel (dependency)
// Estimated: ~45k tokens, ~$0.15 per executionAgentSeam's layer architecture IS a compiler pipeline:
- Layer 2 (Runtime) = Frontend — produces raw events from source (agent runtime)
- Layer 3 (Normalization) = Middle-end — transforms to canonical IR (AgentMessage)
- Layer 4 (Session) = Optimizer — maintains state, enforces invariants
- Layer 5 (Attention) = Analysis pass — derives semantic information
- Layer 6 (Server) = Backend — produces output for consumers
- Layer 7 (View) = Linker — assembles final deliverable for the user
Each layer transforms an IR to the next. Each layer can be independently tested. The boundaries are well-defined contracts.
From Chen et al. (2024): adding more LLM calls to a quality gate doesn't always improve quality. When the eval task has a mix of easy and hard cases:
- Easy cases benefit from voting (more calls → higher accuracy)
- Hard cases suffer from voting (wrong answers dominate majority)
- There exists an optimal K* that maximizes aggregate performance
For a quality gate using K parallel LLM judges:
K* = 2·log(α/(1-α))·(2p₁-1)/(1-2p₂) / log[p₂(1-p₂)/(p₁(1-p₁))]
where:
α = fraction of "easy" eval cases
p₁ = judge accuracy on easy cases
p₂ = judge accuracy on hard cases
- Don't just "add more judges." There's a mathematically optimal number.
- Estimate difficulty distribution first. Run a small sample to determine what fraction of cases are easy vs. hard for your eval.
- Filter-Vote can outperform Vote. Adding a pre-filter stage can improve hard-case performance by removing obvious bad answers before voting.
- Different gates need different K. A code correctness gate (mostly deterministic) needs different K than a "is this persuasive" gate (highly subjective).
This connects to the category theory composability vision: a quality gate is a morphism WorkProduct → EvalResult. The scaling law tells you how to parameterize that morphism:
qualityGate(K=3, filter=true): WorkProduct → EvalResult // for easy domains
qualityGate(K=7, filter=false): WorkProduct → EvalResult // for hard domains
qualityGate(K=1, filter=false): WorkProduct → EvalResult // for deterministic checks
The gate's type signature is the same. The parameters come from the scaling law analysis. The composition still works.
Putting it all together:
┌─────────────────────────────────────┐
│ COMPILER LAYER │
│ Parse → Verify → Optimize → Emit │
└──────────────┬──────────────────────┘
│ compiled workflow
┌──────────────▼──────────────────────┐
│ DAG ORCHESTRATOR │
│ Topological scheduling │
│ Parallel execution │
│ Dependency tracking │
└──────────────┬──────────────────────┘
│ per-node dispatch
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ FSM-GOVERNED │ │ FSM-GOVERNED │ │ FSM-GOVERNED │
│ TASK NODE │ │ EVAL NODE │ │ AGENT NODE │
│ │ │ │ │ │
│ CREATED→READY→ │ │ Scaling law K* │ │ LLM does work │
│ IN_PROGRESS→ │ │ determines how │ │ within FSM │
│ COMPLETED/FAILED │ │ many judges run │ │ state bounds │
└──────────────────┘ └──────────────────┘ └──────────────────┘
│ │ │
└───────────────────┼───────────────────┘
│ results flow back
┌──────────────▼──────────────────────┐
│ COMPOSABLE BUILDING BLOCKS │
│ Category-theoretic composition │
│ Functors for domain adaptation │
│ Quality gates as morphisms │
└─────────────────────────────────────┘
| Formal Method | Role in Architecture | Concern |
|---|---|---|
| DAG | Workflow topology | What runs when |
| FSM | Per-node lifecycle | How each step executes |
| Category Theory | Building block composition | How pieces fit together |
| Compiler Theory | Workflow validation & optimization | Is it correct, can it be better |
| Scaling Laws | Eval node parameterization | How many judges per gate |
- Deadlock-free — DAG acyclicity guarantees no circular waits
- Termination-guaranteed — FSM liveness properties ensure every task completes
- Type-safe — Compiler verification ensures inputs/outputs match
- Optimized — Scaling laws determine resource allocation per node
- Composable — Category-theoretic composition enables mix-and-match building blocks
- Observable — DAG + FSM state is inherently visual and inspectable
- Verifiable — CTL/LTL properties can be model-checked before execution
-
Formalize the session state machine. AgentSeam's 10-state model should have explicit guard conditions and temporal logic properties, following the Allegrini pattern. This enables proving that sessions always terminate, never deadlock, and always recover from failures.
-
Add a compilation step to workflow definitions. Before executing a multi-step workflow, validate the graph structure: no cycles, all dependencies satisfiable, all capabilities available. This is LangGraph's
.compile()pattern, extended with formal checks. -
Use scaling laws for eval design. When building quality gates, empirically estimate the difficulty distribution of the eval task and compute optimal K using Chen et al.'s formula. Don't default to "3 judges."
-
Define workflow stages as typed morphisms. Each stage has an input type, output type, and transformation function. Composition is automatic when types match. This enables the "composable building blocks" vision.
-
Build a DAG orchestrator as a Layer 4+ consumer. The orchestrator sits above AgentSeam's session layer and schedules work across sessions according to a DAG. Each node in the DAG maps to a session (or turn within a session).
-
Implement compiler passes for workflow optimization. Before executing a workflow DAG, run analysis passes: cost estimation, parallelism detection, quality gate insertion, resource allocation.
-
Visual workflow builder. The DAG structure naturally supports visual editing (nodes and edges). This is the "n8n-style" vision — build workflows by connecting blocks rather than writing code.
-
Formal verification integration. Express desired properties in CTL/LTL and use model checkers (SPIN, NuSMV, or custom lightweight checkers) to verify workflows before execution. This catches deadlocks, infinite loops, and safety violations at design time.
-
Category-theoretic workflow library. Build a library of composable building blocks (quality gates, transforms, evals, retries) with formal composition rules. Domain adaptation via functors. This is the "accessible and lower-friction" vision for custom harnesses.
The agent orchestration space is converging on graph-based execution models. LangGraph (400 companies, 90M monthly downloads) already uses StateGraph with .compile(). The Allegrini paper provides the formal theory LangGraph currently lacks. Chen et al. provide the scaling theory that quality gate design currently lacks. Category theory provides the composability theory that workflow builders currently lack.
The opportunity is to build an orchestration layer that combines all four:
- Graph structure from LangGraph's practical success
- Formal verification from Allegrini's temporal logic properties
- Scaling optimization from Chen's compound inference theory
- Composable building blocks from category-theoretic composition
This is not theoretical — each of these has existing implementations or mathematical frameworks. The synthesis is new.
- Allegrini et al. — Formalizing Safety, Security, and Functional Properties of Agentic AI Systems
- Hong et al. — MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
- Chen et al. — Scaling Laws of Compound Inference Systems
- Agents Are Workflows — FSM/MDP formalization of agent workflows
- LangGraph State Machines — Production state machine patterns
- LangGraph Multi-Agent Orchestration Guide — Graph-based architecture analysis
- The 2026 Guide to Agentic Workflow Architectures — Composable architecture patterns
- Agentic Workflows in 2026 — Actor/critic quality gate patterns
- Building AI Agents with Composable Patterns — Reusable building block patterns
- Compiler-R1: Agentic Compiler Auto-tuning with RL — Compiler pass analogy for agent pipelines
- Agentic AI Infrastructure Landscape 2025-2026 — Seven-layer agentic stack analysis
- Category Theory for Programmers — Foundational reference
- LangGraph Graph API — StateGraph compilation model