Skip to content

Instantly share code, notes, and snippets.

@jkp
Last active March 23, 2026 16:58
Show Gist options
  • Select an option

  • Save jkp/213e7201202f43cb4f88e7106d166bb9 to your computer and use it in GitHub Desktop.

Select an option

Save jkp/213e7201202f43cb4f88e7106d166bb9 to your computer and use it in GitHub Desktop.
Merge primitive design alternatives for superagent pipeline
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Deterministic Pipeline: Pushing Nondeterminism to the Edge</title>
<style>
* { box-sizing: border-box; margin: 0; padding: 0; }
body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', system-ui, sans-serif; background: #0d1117; color: #c9d1d9; line-height: 1.6; padding: 2rem; max-width: 1200px; margin: 0 auto; }
h1 { color: #58a6ff; font-size: 1.8rem; margin-bottom: 0.3rem; }
h2 { color: #79c0ff; font-size: 1.3rem; margin: 2.5rem 0 1rem; border-bottom: 1px solid #21262d; padding-bottom: 0.5rem; }
h3 { color: #d2a8ff; font-size: 1.1rem; margin: 1.5rem 0 0.75rem; }
h4 { color: #c9d1d9; font-size: 0.95rem; margin: 1rem 0 0.5rem; }
p { color: #8b949e; margin-bottom: 0.75rem; }
.subtitle { color: #8b949e; font-size: 1rem; margin-bottom: 2rem; }
strong { color: #c9d1d9; }
.principle { background: #161b22; border-left: 3px solid #58a6ff; padding: 1rem 1.5rem; margin: 1.5rem 0; border-radius: 0 8px 8px 0; }
.principle p { color: #c9d1d9; margin: 0; }
.problem-box { background: #161b22; border: 1px solid #f85149; border-radius: 8px; padding: 1.5rem; margin: 1.5rem 0; }
.problem-box h3 { color: #f85149; margin-top: 0; }
.insight { background: #161b22; border: 1px solid #3fb950; border-radius: 8px; padding: 1.5rem; margin: 1.5rem 0; }
.insight h3 { color: #3fb950; margin-top: 0; }
.diagram { background: #161b22; border: 1px solid #30363d; border-radius: 12px; padding: 2rem; margin: 1.5rem 0; overflow-x: auto; }
.diagram-caption { color: #8b949e; font-size: 0.85rem; text-align: center; margin-top: 0.75rem; font-style: italic; }
svg { display: block; margin: 0 auto; }
svg text { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', system-ui, sans-serif; }
.approach { background: #161b22; border: 1px solid #30363d; border-radius: 12px; padding: 1.5rem; margin: 1.5rem 0; }
.approach.current { border-color: #f0883e; }
.approach.proposed { border-color: #3fb950; }
.tag { display: inline-block; font-size: 0.75rem; font-weight: 600; padding: 0.15rem 0.5rem; border-radius: 10px; margin-left: 0.5rem; vertical-align: middle; }
.tag.current { background: #f0883e22; color: #f0883e; border: 1px solid #f0883e55; }
.tag.proposed { background: #3fb95022; color: #3fb950; border: 1px solid #3fb95055; }
.tag.hybrid { background: #d2a8ff22; color: #d2a8ff; border: 1px solid #d2a8ff55; }
.pros-cons { display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; margin-top: 1rem; }
.pros h4 { color: #3fb950; }
.cons h4 { color: #f85149; }
ul { padding-left: 1.25rem; }
li { color: #8b949e; margin-bottom: 0.4rem; font-size: 0.9rem; }
.compare-table { width: 100%; border-collapse: collapse; margin: 1rem 0; font-size: 0.85rem; }
.compare-table th { text-align: left; padding: 0.6rem; background: #21262d; color: #79c0ff; border-bottom: 2px solid #30363d; }
.compare-table td { padding: 0.6rem; border-bottom: 1px solid #21262d; }
.check { color: #3fb950; }
.cross { color: #f85149; }
.maybe { color: #f0883e; }
.flow-step { display: flex; align-items: flex-start; gap: 1rem; margin: 1rem 0; }
.flow-num { background: #21262d; color: #58a6ff; width: 28px; height: 28px; border-radius: 50%; display: flex; align-items: center; justify-content: center; font-size: 0.8rem; font-weight: 700; flex-shrink: 0; }
.flow-text { flex: 1; }
.flow-text p { margin: 0; }
code { background: #21262d; padding: 0.15rem 0.4rem; border-radius: 4px; font-size: 0.85rem; color: #79c0ff; }
.zone { display: inline-block; padding: 0.1rem 0.4rem; border-radius: 3px; font-size: 0.8rem; font-weight: 600; }
.zone.nondeterministic { background: #f8514922; color: #f85149; }
.zone.deterministic { background: #3fb95022; color: #3fb950; }
.zone.boundary { background: #f0883e22; color: #f0883e; }
</style>
</head>
<body>
<h1>Deterministic Pipeline Architecture</h1>
<p class="subtitle">Push nondeterminism to the edges. Serialize everything onto one thread. Record, replay, reproduce.</p>
<div class="principle">
<p><strong>The principle:</strong> All nondeterminism (network, hardware, timing, thread scheduling) must happen outside a serialization boundary. Inside that boundary, the system is a pure function: given the same sequence of inputs, it always produces the same outputs. This makes the entire system recordable and replayable.</p>
</div>
<!-- ============ THE PROBLEM ============ -->
<h2>Why this matters for our system</h2>
<p>Our interview agent has multiple sources of nondeterminism that all feed into the pipeline simultaneously:</p>
<div class="diagram">
<svg width="850" height="320" viewBox="0 0 850 320">
<defs>
<marker id="arr" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse"><path d="M 0 0 L 10 5 L 0 10 z" fill="#30363d"/></marker>
<marker id="arr-red" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse"><path d="M 0 0 L 10 5 L 0 10 z" fill="#f85149"/></marker>
<marker id="arr-green" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse"><path d="M 0 0 L 10 5 L 0 10 z" fill="#3fb950"/></marker>
<marker id="arr-orange" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse"><path d="M 0 0 L 10 5 L 0 10 z" fill="#f0883e"/></marker>
<marker id="arr-blue" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse"><path d="M 0 0 L 10 5 L 0 10 z" fill="#58a6ff"/></marker>
<marker id="arr-purple" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse"><path d="M 0 0 L 10 5 L 0 10 z" fill="#d2a8ff"/></marker>
</defs>
<!-- Title -->
<text x="425" y="22" fill="#8b949e" font-size="11" text-anchor="middle" font-weight="600">NONDETERMINISTIC SOURCES (all arriving concurrently)</text>
<!-- External sources -->
<rect x="10" y="40" width="130" height="45" rx="6" fill="#21262d" stroke="#f85149" stroke-width="1.5"/>
<text x="75" y="58" fill="#f85149" font-size="10" text-anchor="middle" font-weight="600">User's Browser</text>
<text x="75" y="73" fill="#8b949e" font-size="9" text-anchor="middle">audio via WebRTC</text>
<rect x="160" y="40" width="130" height="45" rx="6" fill="#21262d" stroke="#f85149" stroke-width="1.5"/>
<text x="225" y="58" fill="#f85149" font-size="10" text-anchor="middle" font-weight="600">LLM API</text>
<text x="225" y="73" fill="#8b949e" font-size="9" text-anchor="middle">text tokens streaming</text>
<rect x="310" y="40" width="130" height="45" rx="6" fill="#21262d" stroke="#f85149" stroke-width="1.5"/>
<text x="375" y="58" fill="#f85149" font-size="10" text-anchor="middle" font-weight="600">TTS Worker</text>
<text x="375" y="73" fill="#8b949e" font-size="9" text-anchor="middle">audio chunks from thread</text>
<rect x="460" y="40" width="130" height="45" rx="6" fill="#21262d" stroke="#f85149" stroke-width="1.5"/>
<text x="525" y="58" fill="#f85149" font-size="10" text-anchor="middle" font-weight="600">STT Worker</text>
<text x="525" y="73" fill="#8b949e" font-size="9" text-anchor="middle">transcription results</text>
<rect x="610" y="40" width="130" height="45" rx="6" fill="#21262d" stroke="#f85149" stroke-width="1.5"/>
<text x="675" y="58" fill="#f85149" font-size="10" text-anchor="middle" font-weight="600">VAD / Signals</text>
<text x="675" y="73" fill="#8b949e" font-size="9" text-anchor="middle">user started speaking</text>
<!-- Arrows down to... where? -->
<line x1="75" y1="90" x2="75" y2="130" stroke="#f85149" stroke-width="1" stroke-dasharray="4,3"/>
<line x1="225" y1="90" x2="225" y2="130" stroke="#f85149" stroke-width="1" stroke-dasharray="4,3"/>
<line x1="375" y1="90" x2="375" y2="130" stroke="#f85149" stroke-width="1" stroke-dasharray="4,3"/>
<line x1="525" y1="90" x2="525" y2="130" stroke="#f85149" stroke-width="1" stroke-dasharray="4,3"/>
<line x1="675" y1="90" x2="675" y2="130" stroke="#f85149" stroke-width="1" stroke-dasharray="4,3"/>
<!-- The question -->
<rect x="50" y="130" width="700" height="40" rx="8" fill="#f8514915" stroke="#f85149" stroke-width="1.5" stroke-dasharray="6,3"/>
<text x="400" y="155" fill="#f85149" font-size="12" text-anchor="middle" font-weight="600">? How do these get serialized into one deterministic sequence ?</text>
<!-- Arrow down -->
<line x1="400" y1="175" x2="400" y2="200" stroke="#3fb950" stroke-width="2" marker-end="url(#arr-green)"/>
<!-- Deterministic pipeline -->
<rect x="100" y="205" width="600" height="55" rx="10" fill="#0d1117" stroke="#3fb950" stroke-width="2"/>
<text x="400" y="225" fill="#3fb950" font-size="12" text-anchor="middle" font-weight="700">DETERMINISTIC PIPELINE</text>
<text x="400" y="242" fill="#8b949e" font-size="10" text-anchor="middle">step functions: STT decision &rarr; Splitter &rarr; TTS decision &rarr; Sink decision</text>
<text x="400" y="255" fill="#3fb950" font-size="9" text-anchor="middle">same inputs &rarr; same outputs, every time</text>
<!-- Outputs -->
<line x1="400" y1="265" x2="400" y2="290" stroke="#3fb950" stroke-width="1.5" marker-end="url(#arr-green)"/>
<text x="400" y="308" fill="#8b949e" font-size="10" text-anchor="middle">Effects: send audio to browser, call LLM, start TTS worker, etc.</text>
</svg>
</div>
<!-- ============ CURRENT ARCHITECTURE ============ -->
<h2>Current architecture: merge inside the pipeline</h2>
<div class="approach current">
<h3>What happens today <span class="tag current">Current</span></h3>
<p>Each processor manages its own workers and merges their output into the stream internally. The nondeterminism (thread timing) happens <em>inside</em> the pipeline.</p>
<div class="diagram">
<svg width="850" height="280" viewBox="0 0 850 280">
<!-- Nondeterministic zone -->
<rect x="5" y="5" width="840" height="270" rx="12" fill="none" stroke="#f85149" stroke-width="1" stroke-dasharray="6,3"/>
<text x="425" y="25" fill="#f85149" font-size="10" text-anchor="middle" font-weight="600">NONDETERMINISTIC ZONE (cannot replay)</text>
<!-- Transport -->
<rect x="20" y="45" width="80" height="70" rx="6" fill="#21262d" stroke="#58a6ff" stroke-width="1.5"/>
<text x="60" y="65" fill="#58a6ff" font-size="9" text-anchor="middle" font-weight="600">Transport</text>
<text x="60" y="78" fill="#8b949e" font-size="8" text-anchor="middle">WebRTC</text>
<text x="60" y="90" fill="#8b949e" font-size="8" text-anchor="middle">audio in</text>
<text x="60" y="102" fill="#f85149" font-size="7" text-anchor="middle">nondeterministic</text>
<line x1="105" y1="80" x2="140" y2="80" stroke="#30363d" stroke-width="1.5" marker-end="url(#arr)"/>
<!-- ParakeetSTT -->
<rect x="145" y="40" width="140" height="110" rx="8" fill="#1c2129" stroke="#d2a8ff" stroke-width="1.5"/>
<text x="215" y="60" fill="#d2a8ff" font-size="10" text-anchor="middle" font-weight="600">ParakeetSTT</text>
<rect x="160" y="80" width="110" height="30" rx="4" fill="#21262d" stroke="#f85149" stroke-width="1" stroke-dasharray="3,2"/>
<text x="215" y="93" fill="#f85149" font-size="8" text-anchor="middle">STT Worker Thread</text>
<text x="215" y="103" fill="#f85149" font-size="7" text-anchor="middle">nondeterministic</text>
<text x="215" y="140" fill="#f0883e" font-size="8" text-anchor="middle">internal merge</text>
<line x1="290" y1="80" x2="315" y2="80" stroke="#30363d" stroke-width="1.5" marker-end="url(#arr)"/>
<!-- LLM -->
<rect x="320" y="40" width="100" height="80" rx="8" fill="#1c2129" stroke="#d2a8ff" stroke-width="1.5"/>
<text x="370" y="60" fill="#d2a8ff" font-size="10" text-anchor="middle" font-weight="600">LLM Call</text>
<rect x="335" y="75" width="70" height="25" rx="4" fill="#21262d" stroke="#f85149" stroke-width="1" stroke-dasharray="3,2"/>
<text x="370" y="91" fill="#f85149" font-size="8" text-anchor="middle">network</text>
<line x1="425" y1="80" x2="445" y2="80" stroke="#30363d" stroke-width="1.5" marker-end="url(#arr)"/>
<!-- Splitter -->
<rect x="450" y="50" width="80" height="50" rx="8" fill="#1c2129" stroke="#3fb950" stroke-width="1.5"/>
<text x="490" y="72" fill="#3fb950" font-size="10" text-anchor="middle" font-weight="600">Splitter</text>
<text x="490" y="85" fill="#3fb950" font-size="8" text-anchor="middle">pure</text>
<line x1="535" y1="75" x2="555" y2="75" stroke="#30363d" stroke-width="1.5" marker-end="url(#arr)"/>
<!-- KokoroTTS -->
<rect x="560" y="40" width="140" height="110" rx="8" fill="#1c2129" stroke="#d2a8ff" stroke-width="1.5"/>
<text x="630" y="60" fill="#d2a8ff" font-size="10" text-anchor="middle" font-weight="600">KokoroTTS</text>
<rect x="575" y="80" width="110" height="30" rx="4" fill="#21262d" stroke="#f85149" stroke-width="1" stroke-dasharray="3,2"/>
<text x="630" y="93" fill="#f85149" font-size="8" text-anchor="middle">TTS Worker Thread</text>
<text x="630" y="103" fill="#f85149" font-size="7" text-anchor="middle">nondeterministic</text>
<text x="630" y="140" fill="#f0883e" font-size="8" text-anchor="middle">stream.merge(worker)</text>
<line x1="705" y1="80" x2="730" y2="80" stroke="#30363d" stroke-width="1.5" marker-end="url(#arr)"/>
<!-- Transport out -->
<rect x="735" y="45" width="95" height="70" rx="6" fill="#21262d" stroke="#58a6ff" stroke-width="1.5"/>
<text x="782" y="65" fill="#58a6ff" font-size="9" text-anchor="middle" font-weight="600">Transport</text>
<text x="782" y="78" fill="#8b949e" font-size="8" text-anchor="middle">WebRTC</text>
<text x="782" y="90" fill="#8b949e" font-size="8" text-anchor="middle">audio out</text>
<text x="782" y="102" fill="#f85149" font-size="7" text-anchor="middle">nondeterministic</text>
<!-- Signal arrows -->
<rect x="300" y="190" width="200" height="35" rx="6" fill="#21262d" stroke="#f85149" stroke-width="1"/>
<text x="400" y="207" fill="#f85149" font-size="9" text-anchor="middle">UserStartedSpeakingSignal</text>
<text x="400" y="220" fill="#8b949e" font-size="8" text-anchor="middle">arrives at unpredictable time</text>
<line x1="400" y1="190" x2="400" y2="165" stroke="#f85149" stroke-width="1" stroke-dasharray="3,2" marker-end="url(#arr-red)"/>
<!-- Problem callout -->
<text x="425" y="260" fill="#f85149" font-size="10" text-anchor="middle" font-weight="600">Workers + network + signals all interleave nondeterministically inside the pipeline</text>
</svg>
</div>
<div class="problem-box">
<h3>The replay problem</h3>
<p>If you record the inputs to this pipeline (audio from transport, signals from VAD), you <strong>cannot replay</strong> and get the same behavior. Why? Because the TTS worker thread inside KokoroTTS produces audio at its own pace. A <code>UserStartedSpeakingSignal</code> might arrive between chunk 3 and chunk 4 in production, but between chunk 5 and chunk 6 on replay. The merge inside the pipeline creates timing-dependent interleaving that varies between runs.</p>
<p>You'd have to also record the exact output of every worker thread, and replay those too. But then you're not testing the system — you're playing back a recording.</p>
</div>
</div>
<!-- ============ PROPOSED: SERIALIZATION BOUNDARY ============ -->
<h2>Proposed: serialization boundary at the edge</h2>
<div class="approach proposed">
<h3>Everything serialized onto one queue <span class="tag proposed">Proposed</span></h3>
<p>All nondeterministic sources (network, workers, signals) push packets into a single ordered queue <em>before</em> the pipeline. The pipeline reads from this one queue and is fully deterministic. Record the queue, replay the queue, get identical behavior.</p>
<div class="diagram">
<svg width="850" height="380" viewBox="0 0 850 380">
<!-- Nondeterministic zone -->
<rect x="5" y="5" width="310" height="270" rx="12" fill="none" stroke="#f85149" stroke-width="1" stroke-dasharray="6,3"/>
<text x="160" y="25" fill="#f85149" font-size="10" text-anchor="middle" font-weight="600">NONDETERMINISTIC ZONE</text>
<!-- External sources -->
<rect x="20" y="40" width="120" height="35" rx="5" fill="#21262d" stroke="#f85149" stroke-width="1"/>
<text x="80" y="62" fill="#f85149" font-size="9" text-anchor="middle">User audio (WebRTC)</text>
<rect x="20" y="85" width="120" height="35" rx="5" fill="#21262d" stroke="#f85149" stroke-width="1"/>
<text x="80" y="107" fill="#f85149" font-size="9" text-anchor="middle">VAD signals</text>
<rect x="20" y="130" width="120" height="35" rx="5" fill="#21262d" stroke="#f85149" stroke-width="1"/>
<text x="80" y="152" fill="#f85149" font-size="9" text-anchor="middle">LLM tokens (API)</text>
<rect x="20" y="175" width="120" height="35" rx="5" fill="#21262d" stroke="#f85149" stroke-width="1"/>
<text x="80" y="197" fill="#f85149" font-size="9" text-anchor="middle">TTS audio (worker)</text>
<rect x="20" y="220" width="120" height="35" rx="5" fill="#21262d" stroke="#f85149" stroke-width="1"/>
<text x="80" y="242" fill="#f85149" font-size="9" text-anchor="middle">STT text (worker)</text>
<!-- Arrows to serializer -->
<line x1="145" y1="57" x2="200" y2="130" stroke="#f85149" stroke-width="1" marker-end="url(#arr-red)"/>
<line x1="145" y1="102" x2="200" y2="135" stroke="#f85149" stroke-width="1" marker-end="url(#arr-red)"/>
<line x1="145" y1="147" x2="200" y2="145" stroke="#f85149" stroke-width="1" marker-end="url(#arr-red)"/>
<line x1="145" y1="192" x2="200" y2="155" stroke="#f85149" stroke-width="1" marker-end="url(#arr-red)"/>
<line x1="145" y1="237" x2="200" y2="165" stroke="#f85149" stroke-width="1" marker-end="url(#arr-red)"/>
<!-- Serialization point -->
<rect x="200" y="100" width="100" height="90" rx="8" fill="#21262d" stroke="#f0883e" stroke-width="2.5"/>
<text x="250" y="125" fill="#f0883e" font-size="10" text-anchor="middle" font-weight="700">Dispatcher</text>
<text x="250" y="142" fill="#8b949e" font-size="8" text-anchor="middle">serializes all</text>
<text x="250" y="154" fill="#8b949e" font-size="8" text-anchor="middle">sources into</text>
<text x="250" y="166" fill="#8b949e" font-size="8" text-anchor="middle">one ordered</text>
<text x="250" y="178" fill="#f0883e" font-size="8" text-anchor="middle" font-weight="600">queue</text>
<!-- Serialization boundary line -->
<line x1="320" y1="5" x2="320" y2="275" stroke="#f0883e" stroke-width="2.5"/>
<text x="323" y="287" fill="#f0883e" font-size="10" font-weight="700">SERIALIZATION BOUNDARY</text>
<!-- Recording point -->
<rect x="328" y="115" width="55" height="60" rx="6" fill="#f0883e22" stroke="#f0883e" stroke-width="1.5"/>
<text x="355" y="137" fill="#f0883e" font-size="18" text-anchor="middle">&#9210;</text>
<text x="355" y="157" fill="#f0883e" font-size="8" text-anchor="middle" font-weight="600">RECORD</text>
<text x="355" y="168" fill="#f0883e" font-size="7" text-anchor="middle">HERE</text>
<line x1="305" y1="145" x2="328" y2="145" stroke="#f0883e" stroke-width="2" marker-end="url(#arr-orange)"/>
<!-- Deterministic zone -->
<rect x="390" y="5" width="450" height="270" rx="12" fill="none" stroke="#3fb950" stroke-width="1.5"/>
<text x="615" y="25" fill="#3fb950" font-size="10" text-anchor="middle" font-weight="600">DETERMINISTIC ZONE (fully replayable)</text>
<!-- Arrow from record to pipeline -->
<line x1="388" y1="145" x2="415" y2="145" stroke="#3fb950" stroke-width="2" marker-end="url(#arr-green)"/>
<text x="400" y="135" fill="#3fb950" font-size="7" text-anchor="middle">ordered</text>
<text x="400" y="160" fill="#3fb950" font-size="7" text-anchor="middle">packets</text>
<!-- Pipeline processors (all pure step functions) -->
<rect x="420" y="75" width="95" height="55" rx="6" fill="#21262d" stroke="#3fb950" stroke-width="1.5"/>
<text x="467" y="95" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">STT Step</text>
<text x="467" y="108" fill="#3fb950" font-size="8" text-anchor="middle">decision only</text>
<text x="467" y="120" fill="#8b949e" font-size="7" text-anchor="middle">no worker</text>
<line x1="520" y1="100" x2="535" y2="100" stroke="#3fb950" stroke-width="1" marker-end="url(#arr-green)"/>
<rect x="540" y="75" width="85" height="55" rx="6" fill="#21262d" stroke="#3fb950" stroke-width="1.5"/>
<text x="582" y="95" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">Splitter</text>
<text x="582" y="108" fill="#3fb950" font-size="8" text-anchor="middle">pure</text>
<line x1="630" y1="100" x2="645" y2="100" stroke="#3fb950" stroke-width="1" marker-end="url(#arr-green)"/>
<rect x="650" y="75" width="95" height="55" rx="6" fill="#21262d" stroke="#3fb950" stroke-width="1.5"/>
<text x="697" y="95" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">TTS Step</text>
<text x="697" y="108" fill="#3fb950" font-size="8" text-anchor="middle">decision only</text>
<text x="697" y="120" fill="#8b949e" font-size="7" text-anchor="middle">no worker</text>
<line x1="750" y1="100" x2="765" y2="100" stroke="#3fb950" stroke-width="1" marker-end="url(#arr-green)"/>
<rect x="770" y="80" width="55" height="45" rx="6" fill="#21262d" stroke="#3fb950" stroke-width="1.5"/>
<text x="797" y="100" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">Sink</text>
<text x="797" y="112" fill="#3fb950" font-size="8" text-anchor="middle">pure</text>
<!-- Effects output -->
<line x1="615" y1="135" x2="615" y2="160" stroke="#d2a8ff" stroke-width="1.5" marker-end="url(#arr-purple)"/>
<rect x="440" y="165" width="360" height="50" rx="6" fill="#21262d" stroke="#d2a8ff" stroke-width="1.5"/>
<text x="620" y="185" fill="#d2a8ff" font-size="10" text-anchor="middle" font-weight="600">Effects (side-effect descriptors)</text>
<text x="620" y="200" fill="#8b949e" font-size="8" text-anchor="middle">StartSynthesis, Transcribe, PlayAudio, StopPlayback, SendToTransport...</text>
<!-- Effects back to nondeterministic zone -->
<line x1="440" y1="215" x2="440" y2="250" stroke="#d2a8ff" stroke-width="1"/>
<line x1="440" y1="250" x2="160" y2="250" stroke="#d2a8ff" stroke-width="1"/>
<line x1="160" y1="250" x2="160" y2="220" stroke="#d2a8ff" stroke-width="1" marker-end="url(#arr-purple)"/>
<text x="300" y="245" fill="#d2a8ff" font-size="8" text-anchor="middle">effects executed in nondeterministic zone</text>
<text x="300" y="258" fill="#8b949e" font-size="8" text-anchor="middle">results feed back into dispatcher</text>
<!-- Replay callout -->
<rect x="390" y="295" width="450" height="70" rx="8" fill="#3fb95015" stroke="#3fb950" stroke-width="1"/>
<text x="615" y="315" fill="#3fb950" font-size="11" text-anchor="middle" font-weight="700">REPLAY</text>
<text x="615" y="332" fill="#8b949e" font-size="9" text-anchor="middle">Feed the recorded packet sequence into the pipeline.</text>
<text x="615" y="348" fill="#8b949e" font-size="9" text-anchor="middle">Same inputs &rarr; same state transitions &rarr; same effects &rarr; same outputs.</text>
<text x="615" y="362" fill="#3fb950" font-size="9" text-anchor="middle">Hypothesis can shrink the sequence to find the minimal failing case.</text>
</svg>
</div>
</div>
<!-- ============ HOW EFFECTS LOOP BACK ============ -->
<h2>How effects loop back</h2>
<p>The key insight: workers don't live inside processors anymore. They live in the nondeterministic zone. The pipeline emits <em>effects</em> ("start synthesizing this text"), and an <strong>effect executor</strong> in the nondeterministic zone handles them. When the worker produces results, those results feed back through the dispatcher into the deterministic pipeline.</p>
<div class="diagram">
<svg width="850" height="300" viewBox="0 0 850 300">
<!-- The loop -->
<!-- Pipeline (center) -->
<rect x="270" y="30" width="310" height="70" rx="10" fill="#0d1117" stroke="#3fb950" stroke-width="2"/>
<text x="425" y="55" fill="#3fb950" font-size="12" text-anchor="middle" font-weight="700">Deterministic Pipeline</text>
<text x="425" y="72" fill="#8b949e" font-size="9" text-anchor="middle">step functions only, no I/O</text>
<!-- Input arrow -->
<line x1="200" y1="65" x2="265" y2="65" stroke="#3fb950" stroke-width="2" marker-end="url(#arr-green)"/>
<text x="232" y="55" fill="#3fb950" font-size="8" text-anchor="middle">ordered</text>
<text x="232" y="80" fill="#3fb950" font-size="8" text-anchor="middle">packets</text>
<!-- Dispatcher -->
<rect x="100" y="35" width="95" height="60" rx="8" fill="#21262d" stroke="#f0883e" stroke-width="2"/>
<text x="147" y="60" fill="#f0883e" font-size="10" text-anchor="middle" font-weight="700">Dispatcher</text>
<text x="147" y="75" fill="#8b949e" font-size="8" text-anchor="middle">serializes</text>
<!-- External input -->
<line x1="50" y1="65" x2="95" y2="65" stroke="#f85149" stroke-width="1.5" marker-end="url(#arr-red)"/>
<text x="30" y="62" fill="#f85149" font-size="8" text-anchor="middle">user</text>
<text x="30" y="74" fill="#f85149" font-size="8" text-anchor="middle">audio</text>
<!-- Effects output (down) -->
<line x1="425" y1="105" x2="425" y2="140" stroke="#d2a8ff" stroke-width="2" marker-end="url(#arr-purple)"/>
<!-- Effect executor -->
<rect x="300" y="145" width="250" height="60" rx="8" fill="#21262d" stroke="#d2a8ff" stroke-width="2"/>
<text x="425" y="168" fill="#d2a8ff" font-size="11" text-anchor="middle" font-weight="700">Effect Executor</text>
<text x="425" y="185" fill="#8b949e" font-size="9" text-anchor="middle">runs workers, makes API calls, sends to transport</text>
<!-- Workers below -->
<rect x="170" y="230" width="100" height="40" rx="5" fill="#21262d" stroke="#f85149" stroke-width="1"/>
<text x="220" y="248" fill="#f85149" font-size="9" text-anchor="middle">TTS Worker</text>
<text x="220" y="261" fill="#8b949e" font-size="7" text-anchor="middle">OutputAudio</text>
<rect x="290" y="230" width="100" height="40" rx="5" fill="#21262d" stroke="#f85149" stroke-width="1"/>
<text x="340" y="248" fill="#f85149" font-size="9" text-anchor="middle">STT Worker</text>
<text x="340" y="261" fill="#8b949e" font-size="7" text-anchor="middle">Transcript</text>
<rect x="410" y="230" width="100" height="40" rx="5" fill="#21262d" stroke="#f85149" stroke-width="1"/>
<text x="460" y="248" fill="#f85149" font-size="9" text-anchor="middle">LLM API</text>
<text x="460" y="261" fill="#8b949e" font-size="7" text-anchor="middle">TextFrame tokens</text>
<rect x="530" y="230" width="100" height="40" rx="5" fill="#21262d" stroke="#f85149" stroke-width="1"/>
<text x="580" y="248" fill="#f85149" font-size="9" text-anchor="middle">Transport</text>
<text x="580" y="261" fill="#8b949e" font-size="7" text-anchor="middle">send to browser</text>
<!-- Lines from executor to workers -->
<line x1="350" y1="210" x2="220" y2="225" stroke="#d2a8ff" stroke-width="1" marker-end="url(#arr-purple)"/>
<line x1="380" y1="210" x2="340" y2="225" stroke="#d2a8ff" stroke-width="1" marker-end="url(#arr-purple)"/>
<line x1="450" y1="210" x2="460" y2="225" stroke="#d2a8ff" stroke-width="1" marker-end="url(#arr-purple)"/>
<line x1="480" y1="210" x2="580" y2="225" stroke="#d2a8ff" stroke-width="1" marker-end="url(#arr-purple)"/>
<!-- Results loop back -->
<path d="M 220 275 L 220 290 L 80 290 L 80 65 L 95 65" fill="none" stroke="#f85149" stroke-width="1.5" stroke-dasharray="5,3" marker-end="url(#arr-red)"/>
<text x="80" y="178" fill="#f85149" font-size="8" text-anchor="middle" transform="rotate(-90, 80, 178)">results feed back</text>
<path d="M 340 275 L 340 285 L 85 285" fill="none" stroke="#f85149" stroke-width="1" stroke-dasharray="5,3"/>
<path d="M 460 275 L 460 280 L 90 280" fill="none" stroke="#f85149" stroke-width="1" stroke-dasharray="5,3"/>
</svg>
<p class="diagram-caption">Workers produce results asynchronously. Those results feed back through the dispatcher, getting serialized alongside all other inputs. The pipeline never sees nondeterministic timing.</p>
</div>
<!-- ============ CONCRETE EXAMPLE ============ -->
<h2>Concrete example: barge-in during synthesis</h2>
<h4>What the dispatcher's queue looks like:</h4>
<div class="diagram">
<svg width="800" height="200" viewBox="0 0 800 200">
<!-- Queue packets -->
<rect x="10" y="40" width="90" height="45" rx="4" fill="#21262d" stroke="#58a6ff" stroke-width="1"/>
<text x="55" y="58" fill="#58a6ff" font-size="8" text-anchor="middle" font-weight="600">InputAudio</text>
<text x="55" y="72" fill="#8b949e" font-size="7" text-anchor="middle">from user mic</text>
<text x="55" y="30" fill="#8b949e" font-size="8" text-anchor="middle">t=0</text>
<rect x="110" y="40" width="90" height="45" rx="4" fill="#21262d" stroke="#58a6ff" stroke-width="1"/>
<text x="155" y="58" fill="#58a6ff" font-size="8" text-anchor="middle" font-weight="600">InputAudio</text>
<text x="155" y="72" fill="#8b949e" font-size="7" text-anchor="middle">from user mic</text>
<text x="155" y="30" fill="#8b949e" font-size="8" text-anchor="middle">t=1</text>
<rect x="210" y="40" width="90" height="45" rx="4" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="255" y="58" fill="#3fb950" font-size="8" text-anchor="middle" font-weight="600">OutputAudio</text>
<text x="255" y="72" fill="#8b949e" font-size="7" text-anchor="middle">from TTS worker</text>
<text x="255" y="30" fill="#8b949e" font-size="8" text-anchor="middle">t=2</text>
<rect x="310" y="40" width="90" height="45" rx="4" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="355" y="58" fill="#3fb950" font-size="8" text-anchor="middle" font-weight="600">OutputAudio</text>
<text x="355" y="72" fill="#8b949e" font-size="7" text-anchor="middle">from TTS worker</text>
<text x="355" y="30" fill="#8b949e" font-size="8" text-anchor="middle">t=3</text>
<rect x="410" y="40" width="90" height="45" rx="4" fill="#21262d" stroke="#f85149" stroke-width="2"/>
<text x="455" y="55" fill="#f85149" font-size="7" text-anchor="middle" font-weight="600">UserStarted</text>
<text x="455" y="65" fill="#f85149" font-size="7" text-anchor="middle" font-weight="600">Speaking</text>
<text x="455" y="78" fill="#f85149" font-size="7" text-anchor="middle">BARGE-IN</text>
<text x="455" y="30" fill="#f85149" font-size="8" text-anchor="middle" font-weight="600">t=4</text>
<rect x="510" y="40" width="90" height="45" rx="4" fill="#21262d" stroke="#3fb950" stroke-width="1" stroke-dasharray="3,2" opacity="0.5"/>
<text x="555" y="58" fill="#3fb950" font-size="8" text-anchor="middle" opacity="0.5">OutputAudio</text>
<text x="555" y="72" fill="#8b949e" font-size="7" text-anchor="middle" opacity="0.5">stale (aborted)</text>
<text x="555" y="30" fill="#8b949e" font-size="8" text-anchor="middle">t=5</text>
<rect x="610" y="40" width="90" height="45" rx="4" fill="#21262d" stroke="#58a6ff" stroke-width="1"/>
<text x="655" y="58" fill="#58a6ff" font-size="8" text-anchor="middle" font-weight="600">InputAudio</text>
<text x="655" y="72" fill="#8b949e" font-size="7" text-anchor="middle">user speaking</text>
<text x="655" y="30" fill="#8b949e" font-size="8" text-anchor="middle">t=6</text>
<rect x="710" y="40" width="80" height="45" rx="4" fill="#21262d" stroke="#58a6ff" stroke-width="1"/>
<text x="750" y="58" fill="#58a6ff" font-size="8" text-anchor="middle" font-weight="600">EndFrame</text>
<text x="750" y="72" fill="#8b949e" font-size="7" text-anchor="middle">of=InputAudio</text>
<text x="750" y="30" fill="#8b949e" font-size="8" text-anchor="middle">t=7</text>
<!-- Arrow showing direction -->
<line x1="20" y1="100" x2="780" y2="100" stroke="#f0883e" stroke-width="1.5" marker-end="url(#arr-orange)"/>
<text x="400" y="118" fill="#f0883e" font-size="9" text-anchor="middle" font-weight="600">serialized order &mdash; this is what gets recorded</text>
<!-- Explanation -->
<text x="400" y="145" fill="#8b949e" font-size="9" text-anchor="middle">The step function processes these one at a time, in this exact order.</text>
<text x="400" y="160" fill="#8b949e" font-size="9" text-anchor="middle">At t=4, it sees UserStartedSpeakingSignal &rarr; emits Abort effect &rarr; sets synthesizing=false.</text>
<text x="400" y="175" fill="#8b949e" font-size="9" text-anchor="middle">At t=5, stale OutputAudio arrives &rarr; step function knows it's not synthesizing &rarr; drops or forwards.</text>
<text x="400" y="190" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">Replay this sequence &rarr; identical behavior, every time.</text>
</svg>
</div>
<!-- ============ TRADEOFFS ============ -->
<h2>Tradeoffs</h2>
<table class="compare-table">
<tr>
<th>Aspect</th>
<th>Current (merge inside)</th>
<th>Proposed (dispatcher at edge)</th>
</tr>
<tr>
<td>Replayability</td>
<td class="cross">Cannot replay &mdash; worker timing varies</td>
<td class="check">Fully replayable &mdash; one ordered sequence</td>
</tr>
<tr>
<td>Property testing</td>
<td class="maybe">Step functions testable, but composition isn't</td>
<td class="check">Entire pipeline is a pure function of its input sequence</td>
</tr>
<tr>
<td>Hypothesis shrinking</td>
<td class="cross">Can't shrink interleaved async sequences</td>
<td class="check">Shrink the packet sequence like any other list</td>
</tr>
<tr>
<td>Debugging</td>
<td class="cross">Must reproduce timing to reproduce bug</td>
<td class="check">Record input sequence, replay deterministically</td>
</tr>
<tr>
<td>Processor simplicity</td>
<td class="maybe">Each processor manages its own worker</td>
<td class="check">Processors are just step functions</td>
</tr>
<tr>
<td>Architecture change</td>
<td class="check">No change needed</td>
<td class="cross">Workers must move outside pipeline; needs dispatcher</td>
</tr>
<tr>
<td>Latency</td>
<td class="check">Worker output goes directly to stream</td>
<td class="maybe">Extra hop through dispatcher (likely negligible)</td>
</tr>
<tr>
<td>Stale packet handling</td>
<td class="cross">FIXME in code &mdash; stale OutputAudio in queue</td>
<td class="check">Step function sees abort before stale packets &mdash; can handle deterministically</td>
</tr>
</table>
<!-- ============ WHAT NEEDS TO CHANGE ============ -->
<h2>What would need to change</h2>
<div class="flow-step">
<div class="flow-num">1</div>
<div class="flow-text"><p><strong>Build the Dispatcher.</strong> A single async component that accepts packets from any source (transport, workers, signals) via an <code>asyncio.Queue</code> and emits them in arrival order. This is the serialization point. All recording happens here.</p></div>
</div>
<div class="flow-step">
<div class="flow-num">2</div>
<div class="flow-text"><p><strong>Move workers outside processors.</strong> KokoroTTS no longer owns its TTS thread. Instead, it emits <code>StartSynthesis</code> effects. An Effect Executor (in the nondeterministic zone) receives these, starts the worker, and feeds the worker's output back through the Dispatcher.</p></div>
</div>
<div class="flow-step">
<div class="flow-num">3</div>
<div class="flow-text"><p><strong>Processors become pure step functions.</strong> No <code>run()</code> method with async generators. Each processor is just its step function. The pipeline is a loop: read packet from dispatcher, run through step functions in order, collect effects, send effects to executor.</p></div>
</div>
<div class="flow-step">
<div class="flow-num">4</div>
<div class="flow-text"><p><strong>Transport becomes a source and a sink in the nondeterministic zone.</strong> Audio from the user's browser arrives at the Dispatcher. Audio for the user is sent via a <code>SendToTransport</code> effect. Both are outside the deterministic boundary.</p></div>
</div>
<div class="flow-step">
<div class="flow-num">5</div>
<div class="flow-text"><p><strong>Recording is trivial.</strong> Log every packet the Dispatcher emits. Replay by feeding that log back into the pipeline. Hypothesis can generate synthetic packet sequences that simulate any interleaving.</p></div>
</div>
<!-- ============ OPEN QUESTIONS ============ -->
<h2>Open questions</h2>
<div class="approach">
<h4>Does the dispatcher need priority?</h4>
<p>Should <code>UserStartedSpeakingSignal</code> jump the queue? In the current design, <code>merge</code> gives signals priority. In the dispatcher model, if the signal arrives while 10 <code>OutputAudio</code> packets are already queued, does it wait behind them? Probably yes &mdash; and that's fine, because the step function handles the signal whenever it arrives and aborts cleanly. The "priority" is in the decision logic, not the delivery order.</p>
<h4>Does this add latency?</h4>
<p>The extra hop (worker &rarr; dispatcher &rarr; pipeline) adds one queue read. For audio at 16kHz with 10ms chunks, that's negligible. But it's worth measuring.</p>
<h4>Can we migrate incrementally?</h4>
<p>Yes. The step functions already exist. The dispatcher is a new component. Workers can be extracted from processors one at a time. The pipeline can run in "hybrid" mode during migration &mdash; some processors use step functions via the dispatcher, others still use <code>run()</code> with merge.</p>
<h4>What about the LLM call?</h4>
<p>The LLM is a network call that streams tokens back. In the current design, it would be a processor with its own async iteration. In the dispatcher model, it would be an effect: <code>CallLLM(prompt)</code>. The executor makes the API call, and streaming tokens feed back through the dispatcher as <code>TextFrame</code> packets. The pipeline sees them in order, deterministically.</p>
</div>
<!-- ============ DISPATCHER DESIGN ============ -->
<h2>The dispatcher is dumb</h2>
<p>A key design decision: the dispatcher has <strong>no intelligence</strong>. It's a plain FIFO queue. Whatever packet arrives first gets emitted first. No priority, no reordering, no filtering.</p>
<div class="principle">
<p><strong>Why dumb?</strong> If the dispatcher makes decisions, those decisions are nondeterministic (they depend on timing). All intelligence must live inside the deterministic pipeline, where it's testable and replayable. The dispatcher's only job is to impose a total ordering on concurrent events.</p>
</div>
<div class="diagram">
<svg width="800" height="280" viewBox="0 0 800 280">
<!-- Concurrent arrivals -->
<text x="120" y="20" fill="#8b949e" font-size="10" text-anchor="middle" font-weight="600">CONCURRENT EVENTS (real-time)</text>
<!-- Timeline -->
<line x1="30" y1="45" x2="230" y2="45" stroke="#30363d" stroke-width="1"/>
<text x="130" y="40" fill="#8b949e" font-size="8" text-anchor="middle">wall clock time &rarr;</text>
<!-- Events arriving at different times -->
<rect x="40" y="55" width="55" height="28" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="67" y="73" fill="#3fb950" font-size="7" text-anchor="middle">OutAudio</text>
<rect x="70" y="95" width="55" height="28" rx="3" fill="#21262d" stroke="#58a6ff" stroke-width="1"/>
<text x="97" y="113" fill="#58a6ff" font-size="7" text-anchor="middle">InAudio</text>
<rect x="90" y="55" width="55" height="28" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="117" y="73" fill="#3fb950" font-size="7" text-anchor="middle">OutAudio</text>
<rect x="130" y="95" width="55" height="28" rx="3" fill="#21262d" stroke="#f85149" stroke-width="1.5"/>
<text x="157" y="109" fill="#f85149" font-size="6" text-anchor="middle">UserStarted</text>
<text x="157" y="118" fill="#f85149" font-size="6" text-anchor="middle">Speaking</text>
<rect x="180" y="55" width="55" height="28" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1" opacity="0.5"/>
<text x="207" y="73" fill="#3fb950" font-size="7" text-anchor="middle" opacity="0.5">OutAudio</text>
<!-- Arrow down to dispatcher -->
<line x1="130" y1="135" x2="130" y2="155" stroke="#f0883e" stroke-width="1.5" marker-end="url(#arr-orange)"/>
<!-- Dispatcher -->
<rect x="60" y="160" width="140" height="50" rx="8" fill="#21262d" stroke="#f0883e" stroke-width="2"/>
<text x="130" y="180" fill="#f0883e" font-size="11" text-anchor="middle" font-weight="700">Dispatcher</text>
<text x="130" y="195" fill="#8b949e" font-size="8" text-anchor="middle">FIFO &mdash; no logic</text>
<!-- Arrow to serialized output -->
<line x1="205" y1="185" x2="280" y2="185" stroke="#f0883e" stroke-width="2" marker-end="url(#arr-orange)"/>
<!-- Serialized queue -->
<text x="550" y="158" fill="#f0883e" font-size="10" text-anchor="middle" font-weight="600">SERIALIZED OUTPUT (deterministic)</text>
<rect x="290" y="170" width="60" height="30" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="320" y="189" fill="#3fb950" font-size="7" text-anchor="middle">OutAudio</text>
<rect x="358" y="170" width="55" height="30" rx="3" fill="#21262d" stroke="#58a6ff" stroke-width="1"/>
<text x="385" y="189" fill="#58a6ff" font-size="7" text-anchor="middle">InAudio</text>
<rect x="421" y="170" width="60" height="30" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="451" y="189" fill="#3fb950" font-size="7" text-anchor="middle">OutAudio</text>
<rect x="489" y="170" width="68" height="30" rx="3" fill="#21262d" stroke="#f85149" stroke-width="1.5"/>
<text x="523" y="185" fill="#f85149" font-size="6" text-anchor="middle">UserStarted</text>
<text x="523" y="194" fill="#f85149" font-size="6" text-anchor="middle">Speaking</text>
<rect x="565" y="170" width="60" height="30" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1" opacity="0.5"/>
<text x="595" y="189" fill="#3fb950" font-size="7" text-anchor="middle" opacity="0.5">OutAudio</text>
<!-- Arrow to pipeline -->
<line x1="635" y1="185" x2="690" y2="185" stroke="#3fb950" stroke-width="1.5" marker-end="url(#arr-green)"/>
<rect x="695" y="170" width="90" height="30" rx="6" fill="#21262d" stroke="#3fb950" stroke-width="1.5"/>
<text x="740" y="189" fill="#3fb950" font-size="9" text-anchor="middle">Pipeline</text>
<!-- Key point -->
<rect x="250" y="220" width="500" height="50" rx="6" fill="#161b22" stroke="#30363d" stroke-width="1"/>
<text x="500" y="238" fill="#8b949e" font-size="9" text-anchor="middle">The signal doesn't jump the queue. It arrives at t=4 in the serialized order.</text>
<text x="500" y="253" fill="#3fb950" font-size="9" text-anchor="middle">The pipeline's step functions handle the abort &mdash; that's where the intelligence lives.</text>
<text x="500" y="268" fill="#f0883e" font-size="9" text-anchor="middle">On replay, this exact sequence is fed back in. Identical behavior guaranteed.</text>
</svg>
</div>
<div class="insight">
<h3>Why no signal priority in the dispatcher?</h3>
<p>It's tempting to give signals priority in the dispatcher (like <code>merge</code> does today). But that would make the dispatcher stateful and its output timing-dependent. If a signal arrives 1ms before vs 1ms after an audio chunk, the output order changes. That's nondeterminism.</p>
<p>Instead: FIFO order means the signal arrives wherever it arrives. The step function sees it and handles it. If two <code>OutputAudio</code> packets sneak in before the signal, they get played &mdash; and that's fine. The abort happens immediately after. On replay, those same two packets sneak in, and the same abort happens. Deterministic.</p>
<p>The latency cost of processing a few extra packets before the signal is negligible (microseconds of step function execution). The real latency in barge-in is the audio pipeline, not the step function.</p>
</div>
<!-- ============ TICK GRANULARITY ============ -->
<h2>What happens in one tick?</h2>
<p>The pipeline reads one packet from the dispatcher per tick. The question is: what happens to the outputs?</p>
<h3>Option 1: Single pass through the chain</h3>
<div class="diagram">
<svg width="800" height="320" viewBox="0 0 800 320">
<!-- Tick label -->
<rect x="10" y="10" width="780" height="25" rx="4" fill="#21262d"/>
<text x="400" y="27" fill="#f0883e" font-size="10" text-anchor="middle" font-weight="600">ONE TICK: packet flows through all step functions, effects collected at the end</text>
<!-- Input packet -->
<rect x="20" y="55" width="80" height="35" rx="5" fill="#21262d" stroke="#58a6ff" stroke-width="1.5"/>
<text x="60" y="72" fill="#58a6ff" font-size="9" text-anchor="middle">InputAudio</text>
<text x="60" y="83" fill="#8b949e" font-size="7" text-anchor="middle">from queue</text>
<line x1="105" y1="72" x2="135" y2="72" stroke="#30363d" stroke-width="1.5" marker-end="url(#arr)"/>
<!-- STT step -->
<rect x="140" y="50" width="110" height="80" rx="6" fill="#0d1117" stroke="#3fb950" stroke-width="1.5"/>
<text x="195" y="68" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">STT Step</text>
<text x="195" y="82" fill="#8b949e" font-size="7" text-anchor="middle">state: accumulating</text>
<text x="195" y="96" fill="#8b949e" font-size="7" text-anchor="middle">output: [packet]</text>
<text x="195" y="110" fill="#d2a8ff" font-size="7" text-anchor="middle">effects: []</text>
<text x="195" y="122" fill="#3fb950" font-size="7" text-anchor="middle">new state: +1 frame</text>
<line x1="255" y1="72" x2="285" y2="72" stroke="#30363d" stroke-width="1.5" marker-end="url(#arr)"/>
<!-- Splitter step -->
<rect x="290" y="50" width="110" height="80" rx="6" fill="#0d1117" stroke="#3fb950" stroke-width="1.5"/>
<text x="345" y="68" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">Splitter Step</text>
<text x="345" y="82" fill="#8b949e" font-size="7" text-anchor="middle">InputAudio? not text</text>
<text x="345" y="96" fill="#8b949e" font-size="7" text-anchor="middle">output: [packet]</text>
<text x="345" y="110" fill="#d2a8ff" font-size="7" text-anchor="middle">effects: []</text>
<text x="345" y="122" fill="#3fb950" font-size="7" text-anchor="middle">state unchanged</text>
<line x1="405" y1="72" x2="435" y2="72" stroke="#30363d" stroke-width="1.5" marker-end="url(#arr)"/>
<!-- TTS step -->
<rect x="440" y="50" width="110" height="80" rx="6" fill="#0d1117" stroke="#3fb950" stroke-width="1.5"/>
<text x="495" y="68" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">TTS Step</text>
<text x="495" y="82" fill="#8b949e" font-size="7" text-anchor="middle">InputAudio? not text</text>
<text x="495" y="96" fill="#8b949e" font-size="7" text-anchor="middle">output: [packet]</text>
<text x="495" y="110" fill="#d2a8ff" font-size="7" text-anchor="middle">effects: []</text>
<text x="495" y="122" fill="#3fb950" font-size="7" text-anchor="middle">state unchanged</text>
<line x1="555" y1="72" x2="585" y2="72" stroke="#30363d" stroke-width="1.5" marker-end="url(#arr)"/>
<!-- Sink step -->
<rect x="590" y="50" width="110" height="80" rx="6" fill="#0d1117" stroke="#3fb950" stroke-width="1.5"/>
<text x="645" y="68" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">Sink Step</text>
<text x="645" y="82" fill="#8b949e" font-size="7" text-anchor="middle">InputAudio? forward</text>
<text x="645" y="96" fill="#8b949e" font-size="7" text-anchor="middle">output: [packet]</text>
<text x="645" y="110" fill="#d2a8ff" font-size="7" text-anchor="middle">effects: [SendAudio]</text>
<text x="645" y="122" fill="#3fb950" font-size="7" text-anchor="middle">state unchanged</text>
<!-- Collected effects -->
<line x1="645" y1="135" x2="645" y2="155" stroke="#d2a8ff" stroke-width="1.5" marker-end="url(#arr-purple)"/>
<rect x="400" y="160" width="370" height="35" rx="5" fill="#21262d" stroke="#d2a8ff" stroke-width="1.5"/>
<text x="585" y="182" fill="#d2a8ff" font-size="9" text-anchor="middle">Collected effects from all stages: [SendAudio(...)]</text>
<line x1="585" y1="200" x2="585" y2="220" stroke="#d2a8ff" stroke-width="1" marker-end="url(#arr-purple)"/>
<text x="585" y="238" fill="#8b949e" font-size="9" text-anchor="middle">Fire-and-forget to effect executor</text>
<text x="585" y="252" fill="#8b949e" font-size="9" text-anchor="middle">Results arrive later via dispatcher</text>
<!-- Now show a more interesting tick -->
<rect x="10" y="275" width="780" height="25" rx="4" fill="#21262d"/>
<text x="400" y="292" fill="#f0883e" font-size="10" text-anchor="middle" font-weight="600">Each stage's output packets become the next stage's input &mdash; one packet can produce effects at multiple stages</text>
</svg>
</div>
<p>The single-pass approach is simple: one packet enters, it flows through every step function in order, each step function transforms or forwards it, and effects are collected. After the pass, all effects are sent to the executor at once.</p>
<p>When a step function produces <em>new</em> packets (like STT producing a <code>Transcript</code> from a <code>Transcribe</code> effect result), those new packets need to flow through downstream step functions too. This happens within the same tick &mdash; the Transcript flows through Splitter then TTS in the same pass.</p>
<h3>Option 2: Intermediate results re-enter the queue</h3>
<div class="diagram">
<svg width="800" height="200" viewBox="0 0 800 200">
<rect x="10" y="10" width="780" height="25" rx="4" fill="#21262d"/>
<text x="400" y="27" fill="#f0883e" font-size="10" text-anchor="middle" font-weight="600">ALTERNATIVE: each stage is independent, outputs re-enter the dispatcher queue</text>
<!-- Input -->
<rect x="30" y="60" width="60" height="30" rx="4" fill="#21262d" stroke="#58a6ff" stroke-width="1"/>
<text x="60" y="79" fill="#58a6ff" font-size="8" text-anchor="middle">InAudio</text>
<line x1="95" y1="75" x2="120" y2="75" stroke="#30363d" stroke-width="1" marker-end="url(#arr)"/>
<!-- STT -->
<rect x="125" y="55" width="70" height="40" rx="5" fill="#0d1117" stroke="#3fb950" stroke-width="1.5"/>
<text x="160" y="79" fill="#3fb950" font-size="9" text-anchor="middle">STT</text>
<!-- Output goes back to queue -->
<line x1="200" y1="75" x2="230" y2="75" stroke="#f0883e" stroke-width="1"/>
<line x1="230" y1="75" x2="230" y2="130" stroke="#f0883e" stroke-width="1"/>
<line x1="230" y1="130" x2="60" y2="130" stroke="#f0883e" stroke-width="1"/>
<line x1="60" y1="130" x2="60" y2="100" stroke="#f0883e" stroke-width="1" marker-end="url(#arr-orange)"/>
<rect x="240" y="115" width="130" height="25" rx="4" fill="#21262d" stroke="#f0883e" stroke-width="1"/>
<text x="305" y="132" fill="#f0883e" font-size="8" text-anchor="middle">Transcript re-enters queue</text>
<!-- Problem annotation -->
<rect x="400" y="55" width="380" height="80" rx="6" fill="#161b22" stroke="#f85149" stroke-width="1"/>
<text x="590" y="75" fill="#f85149" font-size="10" text-anchor="middle" font-weight="600">Problem</text>
<text x="590" y="92" fill="#8b949e" font-size="9" text-anchor="middle">The Transcript re-enters the queue and competes with</text>
<text x="590" y="106" fill="#8b949e" font-size="9" text-anchor="middle">other packets. If an OutputAudio arrives between the STT</text>
<text x="590" y="120" fill="#8b949e" font-size="9" text-anchor="middle">output and the Splitter's processing of it, the order changes.</text>
<text x="590" y="134" fill="#f85149" font-size="9" text-anchor="middle" font-weight="600">This reintroduces nondeterminism.</text>
<!-- Verdict -->
<rect x="200" y="155" width="400" height="30" rx="5" fill="#3fb95015" stroke="#3fb950" stroke-width="1"/>
<text x="400" y="175" fill="#3fb950" font-size="10" text-anchor="middle" font-weight="600">Single pass is the right choice &mdash; keeps everything deterministic</text>
</svg>
</div>
<div class="insight">
<h3>Single pass wins</h3>
<p>If intermediate results re-enter the dispatcher queue, they compete with new external events. A <code>Transcript</code> produced by the STT step could be interleaved with an <code>OutputAudio</code> from the TTS worker. That interleaving depends on timing &mdash; nondeterministic.</p>
<p>With the single-pass model, one packet enters the pipeline, flows through all step functions in order, and every intermediate result is processed in the same tick. No external events can interleave. The pipeline is a pure function: <code>f(states, packet) &rarr; (new_states, effects)</code>.</p>
</div>
<!-- ============ EFFECT EXECUTION: THE REAL PROBLEM ============ -->
<h2>Effect execution: the granularity problem</h2>
<p>Here's a subtle but critical question: <strong>when do effects get executed relative to reading the next packet?</strong></p>
<h3>The naive approach (broken)</h3>
<div class="diagram">
<svg width="800" height="280" viewBox="0 0 800 280">
<text x="400" y="20" fill="#f85149" font-size="11" text-anchor="middle" font-weight="600">BROKEN: execute effects between each tick</text>
<!-- Tick 1 -->
<rect x="20" y="40" width="70" height="25" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="55" y="57" fill="#3fb950" font-size="7" text-anchor="middle">Out1 &rarr; step</text>
<line x1="95" y1="52" x2="105" y2="52" stroke="#30363d" stroke-width="1" marker-end="url(#arr)"/>
<rect x="110" y="40" width="70" height="25" rx="3" fill="#21262d" stroke="#d2a8ff" stroke-width="1.5"/>
<text x="145" y="57" fill="#d2a8ff" font-size="7" text-anchor="middle">PlayAudio</text>
<line x1="185" y1="52" x2="195" y2="52" stroke="#d2a8ff" stroke-width="1" marker-end="url(#arr-purple)"/>
<rect x="200" y="35" width="80" height="35" rx="3" fill="#f8514922" stroke="#f85149" stroke-width="1"/>
<text x="240" y="50" fill="#f85149" font-size="7" text-anchor="middle">SPEAKER</text>
<text x="240" y="62" fill="#f85149" font-size="6" text-anchor="middle">plays 10ms</text>
<!-- Tick 2 -->
<rect x="20" y="80" width="70" height="25" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="55" y="97" fill="#3fb950" font-size="7" text-anchor="middle">Out2 &rarr; step</text>
<line x1="95" y1="92" x2="105" y2="92" stroke="#30363d" stroke-width="1" marker-end="url(#arr)"/>
<rect x="110" y="80" width="70" height="25" rx="3" fill="#21262d" stroke="#d2a8ff" stroke-width="1.5"/>
<text x="145" y="97" fill="#d2a8ff" font-size="7" text-anchor="middle">PlayAudio</text>
<line x1="185" y1="92" x2="195" y2="92" stroke="#d2a8ff" stroke-width="1" marker-end="url(#arr-purple)"/>
<rect x="200" y="75" width="80" height="35" rx="3" fill="#f8514922" stroke="#f85149" stroke-width="1"/>
<text x="240" y="90" fill="#f85149" font-size="7" text-anchor="middle">SPEAKER</text>
<text x="240" y="102" fill="#f85149" font-size="6" text-anchor="middle">plays 10ms</text>
<!-- dots -->
<text x="145" y="125" fill="#8b949e" font-size="14" text-anchor="middle">...</text>
<!-- Tick 500 -->
<rect x="20" y="140" width="70" height="25" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="55" y="157" fill="#3fb950" font-size="7" text-anchor="middle">Out500 &rarr; step</text>
<line x1="95" y1="152" x2="105" y2="152" stroke="#30363d" stroke-width="1" marker-end="url(#arr)"/>
<rect x="110" y="140" width="70" height="25" rx="3" fill="#21262d" stroke="#d2a8ff" stroke-width="1.5"/>
<text x="145" y="157" fill="#d2a8ff" font-size="7" text-anchor="middle">PlayAudio</text>
<line x1="185" y1="152" x2="195" y2="152" stroke="#d2a8ff" stroke-width="1" marker-end="url(#arr-purple)"/>
<rect x="200" y="135" width="80" height="35" rx="3" fill="#f8514922" stroke="#f85149" stroke-width="1"/>
<text x="240" y="150" fill="#f85149" font-size="7" text-anchor="middle">SPEAKER</text>
<text x="240" y="162" fill="#f85149" font-size="6" text-anchor="middle">plays 10ms</text>
<!-- FINALLY the signal -->
<rect x="20" y="180" width="70" height="25" rx="3" fill="#21262d" stroke="#f85149" stroke-width="2"/>
<text x="55" y="197" fill="#f85149" font-size="7" text-anchor="middle">Signal &rarr; step</text>
<line x1="95" y1="192" x2="105" y2="192" stroke="#30363d" stroke-width="1" marker-end="url(#arr)"/>
<rect x="110" y="180" width="70" height="25" rx="3" fill="#21262d" stroke="#d2a8ff" stroke-width="1.5"/>
<text x="145" y="197" fill="#d2a8ff" font-size="7" text-anchor="middle">StopPlay</text>
<!-- Problem callout -->
<rect x="330" y="40" width="450" height="95" rx="8" fill="#161b22" stroke="#f85149" stroke-width="2"/>
<text x="555" y="62" fill="#f85149" font-size="11" text-anchor="middle" font-weight="700">The problem</text>
<text x="555" y="80" fill="#8b949e" font-size="9" text-anchor="middle">If we execute <tspan fill="#d2a8ff">PlayAudio</tspan> after each tick, we're sending</text>
<text x="555" y="95" fill="#8b949e" font-size="9" text-anchor="middle">each chunk to the speaker before reading the next packet.</text>
<text x="555" y="110" fill="#8b949e" font-size="9" text-anchor="middle">500 chunks &times; 10ms = <tspan fill="#f85149" font-weight="600">5 seconds of audio played</tspan> before we see the signal.</text>
<text x="555" y="125" fill="#f85149" font-size="9" text-anchor="middle" font-weight="600">The decision was fast. The effect execution was slow.</text>
<!-- Time annotation -->
<rect x="290" y="38" width="3" height="170" fill="#f85149" opacity="0.3"/>
<text x="300" y="230" fill="#f85149" font-size="8">5 seconds elapsed</text>
<text x="300" y="245" fill="#f85149" font-size="8">before signal is seen</text>
</svg>
</div>
<h3>The solution: decouple decisions from execution</h3>
<p>The pipeline must <strong>never wait for effects</strong>. Effects are always fire-and-forget. The pipeline processes packets as fast as possible, emitting effects as it goes, but never blocking on their execution.</p>
<div class="diagram">
<svg width="800" height="400" viewBox="0 0 800 400">
<text x="400" y="20" fill="#3fb950" font-size="11" text-anchor="middle" font-weight="600">CORRECT: effects are non-blocking, pipeline processes all packets immediately</text>
<!-- Pipeline processes all packets fast -->
<rect x="20" y="40" width="500" height="130" rx="10" fill="#0d1117" stroke="#3fb950" stroke-width="2"/>
<text x="270" y="60" fill="#3fb950" font-size="10" text-anchor="middle" font-weight="700">Pipeline loop (< 1ms total)</text>
<!-- Packets flowing through fast -->
<rect x="35" y="70" width="42" height="22" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="56" y="85" fill="#3fb950" font-size="6" text-anchor="middle">Out1</text>
<rect x="82" y="70" width="42" height="22" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="103" y="85" fill="#3fb950" font-size="6" text-anchor="middle">Out2</text>
<rect x="129" y="70" width="42" height="22" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="150" y="85" fill="#3fb950" font-size="6" text-anchor="middle">Out3</text>
<text x="190" y="85" fill="#3fb950" font-size="9">...</text>
<rect x="208" y="70" width="42" height="22" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="229" y="85" fill="#3fb950" font-size="6" text-anchor="middle">Out500</text>
<rect x="260" y="67" width="60" height="28" rx="4" fill="#21262d" stroke="#f85149" stroke-width="2"/>
<text x="290" y="82" fill="#f85149" font-size="7" text-anchor="middle" font-weight="600">Signal!</text>
<!-- Effects produced (just data, not executed) -->
<text x="270" y="110" fill="#d2a8ff" font-size="9" text-anchor="middle">Effects produced (just data &mdash; not executed yet):</text>
<rect x="35" y="118" width="38" height="18" rx="2" fill="#21262d" stroke="#d2a8ff" stroke-width="0.5"/>
<text x="54" y="130" fill="#d2a8ff" font-size="5" text-anchor="middle">Play1</text>
<rect x="76" y="118" width="38" height="18" rx="2" fill="#21262d" stroke="#d2a8ff" stroke-width="0.5"/>
<text x="95" y="130" fill="#d2a8ff" font-size="5" text-anchor="middle">Play2</text>
<rect x="117" y="118" width="38" height="18" rx="2" fill="#21262d" stroke="#d2a8ff" stroke-width="0.5"/>
<text x="136" y="130" fill="#d2a8ff" font-size="5" text-anchor="middle">Play3</text>
<text x="172" y="130" fill="#d2a8ff" font-size="8">...</text>
<rect x="188" y="118" width="42" height="18" rx="2" fill="#21262d" stroke="#d2a8ff" stroke-width="0.5"/>
<text x="209" y="130" fill="#d2a8ff" font-size="5" text-anchor="middle">Play500</text>
<rect x="236" y="115" width="55" height="24" rx="3" fill="#21262d" stroke="#f85149" stroke-width="1.5"/>
<text x="263" y="131" fill="#f85149" font-size="6" text-anchor="middle" font-weight="600">StopPlay</text>
<!-- Step function saw the signal and produced Abort -->
<text x="270" y="157" fill="#3fb950" font-size="8" text-anchor="middle">Step function processed all 501 packets in &lt;1ms. StopPlayback cancels the PlayAudios.</text>
<!-- Now what happens to effects? -->
<line x1="270" y1="170" x2="270" y2="195" stroke="#d2a8ff" stroke-width="1.5" marker-end="url(#arr-purple)"/>
<!-- Effect resolution -->
<rect x="60" y="200" width="420" height="95" rx="8" fill="#161b22" stroke="#d2a8ff" stroke-width="2"/>
<text x="270" y="222" fill="#d2a8ff" font-size="10" text-anchor="middle" font-weight="700">Effect resolution (still deterministic)</text>
<text x="270" y="242" fill="#8b949e" font-size="9" text-anchor="middle">The pipeline sees the full effect list before executing anything:</text>
<text x="270" y="260" fill="#d2a8ff" font-size="9" text-anchor="middle">[Play1, Play2, ... Play500, <tspan fill="#f85149" font-weight="600">StopPlayback</tspan>]</text>
<text x="270" y="280" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">StopPlayback at the end cancels all preceding PlayAudios.</text>
<text x="270" y="292" fill="#8b949e" font-size="8" text-anchor="middle">Net result: nothing gets played. Speaker is stopped immediately.</text>
<!-- Alternative: batch processing -->
<rect x="520" y="200" width="270" height="95" rx="8" fill="#161b22" stroke="#f0883e" stroke-width="1.5"/>
<text x="655" y="222" fill="#f0883e" font-size="10" text-anchor="middle" font-weight="700">Or: batch + optimize</text>
<text x="655" y="242" fill="#8b949e" font-size="9" text-anchor="middle">Before sending to executor,</text>
<text x="655" y="258" fill="#8b949e" font-size="9" text-anchor="middle">scan the effect list:</text>
<text x="655" y="278" fill="#f0883e" font-size="9" text-anchor="middle">if StopPlayback present,</text>
<text x="655" y="292" fill="#f0883e" font-size="9" text-anchor="middle" font-weight="600">drop all PlayAudio before it</text>
<!-- The key insight -->
<rect x="100" y="320" width="600" height="65" rx="8" fill="#3fb95015" stroke="#3fb950" stroke-width="1.5"/>
<text x="400" y="342" fill="#3fb950" font-size="11" text-anchor="middle" font-weight="700">The key: process ALL packets first, THEN deal with effects</text>
<text x="400" y="360" fill="#8b949e" font-size="9" text-anchor="middle">The pipeline reads the entire queue (or a time-bounded batch), runs all step functions,</text>
<text x="400" y="375" fill="#8b949e" font-size="9" text-anchor="middle">collects all effects, then resolves them. A StopPlayback cancels earlier PlayAudios in the same batch.</text>
</svg>
</div>
<div class="principle">
<p><strong>The rule:</strong> The pipeline must never execute an effect between reading two packets. It reads a batch of packets, processes them all through the step functions (microseconds), then resolves the collected effects. If a cancellation effect (like <code>StopPlayback</code>) appears in the batch, it cancels the preceding effects of the type it targets. This is still deterministic &mdash; the effect list is ordered, and the resolution rules are fixed.</p>
</div>
<h3>How batching works</h3>
<div class="diagram">
<svg width="800" height="250" viewBox="0 0 800 250">
<!-- Queue -->
<text x="400" y="20" fill="#f0883e" font-size="10" text-anchor="middle" font-weight="600">Dispatcher queue</text>
<rect x="30" y="30" width="740" height="35" rx="6" fill="#21262d" stroke="#f0883e" stroke-width="1"/>
<text x="50" y="50" fill="#3fb950" font-size="7">Out1</text>
<text x="85" y="50" fill="#3fb950" font-size="7">Out2</text>
<text x="120" y="50" fill="#3fb950" font-size="7">Out3</text>
<text x="155" y="50" fill="#8b949e" font-size="8">...</text>
<text x="180" y="50" fill="#3fb950" font-size="7">Out500</text>
<text x="230" y="50" fill="#f85149" font-size="7" font-weight="600">Signal</text>
<text x="280" y="50" fill="#58a6ff" font-size="7">InAudio</text>
<text x="330" y="50" fill="#58a6ff" font-size="7">InAudio</text>
<text x="380" y="50" fill="#3fb950" font-size="7">Out501</text>
<text x="430" y="50" fill="#3fb950" font-size="7">Out502</text>
<text x="470" y="50" fill="#8b949e" font-size="7">(more arriving...)</text>
<!-- Batch 1 -->
<rect x="30" y="30" width="260" height="35" rx="6" fill="none" stroke="#3fb950" stroke-width="2"/>
<text x="160" y="80" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">Batch 1: drain queue (or time-bound)</text>
<line x1="160" y1="88" x2="160" y2="105" stroke="#3fb950" stroke-width="1.5" marker-end="url(#arr-green)"/>
<!-- Process batch -->
<rect x="50" y="110" width="220" height="55" rx="6" fill="#0d1117" stroke="#3fb950" stroke-width="1.5"/>
<text x="160" y="128" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">Process all 502 packets</text>
<text x="160" y="143" fill="#8b949e" font-size="8" text-anchor="middle">step functions run, effects collected</text>
<text x="160" y="155" fill="#8b949e" font-size="8" text-anchor="middle">~0.5ms total</text>
<!-- Effect list -->
<line x1="160" y1="170" x2="160" y2="190" stroke="#d2a8ff" stroke-width="1.5" marker-end="url(#arr-purple)"/>
<rect x="30" y="195" width="350" height="45" rx="6" fill="#161b22" stroke="#d2a8ff" stroke-width="1.5"/>
<text x="205" y="212" fill="#d2a8ff" font-size="9" text-anchor="middle">Effects: [Play1..Play500, <tspan fill="#f85149" font-weight="600">StopPlayback</tspan>, AccumAudio, AccumAudio]</text>
<text x="205" y="230" fill="#3fb950" font-size="9" text-anchor="middle">Resolve: StopPlayback cancels Play1..Play500. Send remaining to executor.</text>
<!-- Batch 2 waiting -->
<rect x="295" y="30" width="200" height="35" rx="6" fill="none" stroke="#8b949e" stroke-width="1" stroke-dasharray="4,3"/>
<text x="395" y="80" fill="#8b949e" font-size="8" text-anchor="middle">Batch 2: next iteration</text>
<!-- Annotation -->
<rect x="430" y="110" width="350" height="80" rx="6" fill="#161b22" stroke="#30363d" stroke-width="1"/>
<text x="605" y="132" fill="#79c0ff" font-size="10" text-anchor="middle" font-weight="600">Batch size options</text>
<text x="605" y="150" fill="#8b949e" font-size="8" text-anchor="middle"><tspan font-weight="600" fill="#c9d1d9">Drain all:</tspan> process everything currently in the queue</text>
<text x="605" y="165" fill="#8b949e" font-size="8" text-anchor="middle"><tspan font-weight="600" fill="#c9d1d9">Time-bound:</tspan> process for up to N ms, then execute effects</text>
<text x="605" y="180" fill="#8b949e" font-size="8" text-anchor="middle"><tspan font-weight="600" fill="#c9d1d9">Count-bound:</tspan> process up to N packets, then execute</text>
</svg>
</div>
<div class="insight">
<h3>Effect cancellation is deterministic</h3>
<p>When the pipeline collects effects from a batch, it can resolve cancellations before executing anything. <code>StopPlayback</code> cancels all preceding <code>PlayAudio</code> effects in the same batch. <code>Abort</code> cancels preceding <code>StartSynthesis</code>/<code>FeedText</code>. These resolution rules are fixed and deterministic &mdash; they only depend on the effect list, which is determined by the packet sequence, which is recorded.</p>
<p>The effect executor only sees the <em>resolved</em> effect list &mdash; the net effects after cancellations. In the barge-in case, it sees: "stop the speaker" (no play commands). Clean, immediate, no 5-second delay.</p>
</div>
<!-- ============ THE BATCH SIZE TRADEOFF ============ -->
<h2>The batch size tradeoff</h2>
<p>Batch size is the critical tuning parameter. It determines the worst-case latency between a decision being made and an effect being executed.</p>
<div class="diagram">
<svg width="800" height="440" viewBox="0 0 800 440">
<text x="400" y="22" fill="#79c0ff" font-size="12" text-anchor="middle" font-weight="700">Batch size vs. latency tradeoff</text>
<!-- Scenario setup -->
<text x="400" y="48" fill="#8b949e" font-size="9" text-anchor="middle">Scenario: TTS produces audio. No interruption. User is listening.</text>
<text x="400" y="62" fill="#8b949e" font-size="9" text-anchor="middle">We want audio to start playing as soon as the first chunk is ready.</text>
<!-- Small batch -->
<rect x="20" y="80" width="370" height="155" rx="8" fill="#161b22" stroke="#3fb950" stroke-width="1.5"/>
<text x="205" y="100" fill="#3fb950" font-size="10" text-anchor="middle" font-weight="700">Small batch (e.g. 1-10 packets)</text>
<!-- Timeline -->
<line x1="40" y1="120" x2="370" y2="120" stroke="#30363d" stroke-width="1"/>
<rect x="45" y="125" width="30" height="18" rx="2" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="60" y="137" fill="#3fb950" font-size="6" text-anchor="middle">Out1</text>
<rect x="80" y="125" width="40" height="18" rx="2" fill="#21262d" stroke="#d2a8ff" stroke-width="1"/>
<text x="100" y="137" fill="#d2a8ff" font-size="6" text-anchor="middle">resolve</text>
<rect x="125" y="125" width="40" height="18" rx="2" fill="#21262d" stroke="#f0883e" stroke-width="1"/>
<text x="145" y="137" fill="#f0883e" font-size="6" text-anchor="middle">execute</text>
<text x="177" y="137" fill="#3fb950" font-size="8">&rarr;</text>
<rect x="190" y="125" width="30" height="18" rx="2" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="205" y="137" fill="#3fb950" font-size="6" text-anchor="middle">Out2</text>
<rect x="225" y="125" width="40" height="18" rx="2" fill="#21262d" stroke="#d2a8ff" stroke-width="1"/>
<text x="245" y="137" fill="#d2a8ff" font-size="6" text-anchor="middle">resolve</text>
<rect x="270" y="125" width="40" height="18" rx="2" fill="#21262d" stroke="#f0883e" stroke-width="1"/>
<text x="290" y="137" fill="#f0883e" font-size="6" text-anchor="middle">execute</text>
<text x="325" y="137" fill="#8b949e" font-size="8">...</text>
<text x="205" y="165" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">Low latency to first audio: ~instant</text>
<text x="205" y="180" fill="#8b949e" font-size="8" text-anchor="middle">Audio starts playing as soon as first chunk arrives.</text>
<text x="205" y="195" fill="#8b949e" font-size="8" text-anchor="middle">But: if interrupt arrives 5 packets later, 4 chunks</text>
<text x="205" y="208" fill="#8b949e" font-size="8" text-anchor="middle">already executed before cancellation was possible.</text>
<text x="205" y="225" fill="#f0883e" font-size="8" text-anchor="middle">Interrupt latency: up to (batch_size &times; chunk_duration)</text>
<!-- Large batch -->
<rect x="410" y="80" width="370" height="155" rx="8" fill="#161b22" stroke="#f0883e" stroke-width="1.5"/>
<text x="595" y="100" fill="#f0883e" font-size="10" text-anchor="middle" font-weight="700">Large batch (e.g. drain queue)</text>
<!-- Timeline -->
<line x1="430" y1="120" x2="760" y2="120" stroke="#30363d" stroke-width="1"/>
<rect x="435" y="125" width="18" height="18" rx="2" fill="#21262d" stroke="#3fb950" stroke-width="0.5"/>
<rect x="455" y="125" width="18" height="18" rx="2" fill="#21262d" stroke="#3fb950" stroke-width="0.5"/>
<rect x="475" y="125" width="18" height="18" rx="2" fill="#21262d" stroke="#3fb950" stroke-width="0.5"/>
<text x="505" y="137" fill="#8b949e" font-size="8">...</text>
<rect x="520" y="125" width="18" height="18" rx="2" fill="#21262d" stroke="#3fb950" stroke-width="0.5"/>
<rect x="545" y="125" width="50" height="18" rx="2" fill="#21262d" stroke="#d2a8ff" stroke-width="1"/>
<text x="570" y="137" fill="#d2a8ff" font-size="6" text-anchor="middle">resolve all</text>
<rect x="600" y="125" width="50" height="18" rx="2" fill="#21262d" stroke="#f0883e" stroke-width="1"/>
<text x="625" y="137" fill="#f0883e" font-size="6" text-anchor="middle">execute net</text>
<text x="595" y="165" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">Best cancellation: interrupts cancel everything</text>
<text x="595" y="180" fill="#8b949e" font-size="8" text-anchor="middle">StopPlayback cancels all 500 PlayAudios in the batch.</text>
<text x="595" y="195" fill="#8b949e" font-size="8" text-anchor="middle">Zero audio played before abort.</text>
<text x="595" y="210" fill="#f85149" font-size="8" text-anchor="middle">But: first audio only plays after the ENTIRE batch</text>
<text x="595" y="223" fill="#f85149" font-size="8" text-anchor="middle">is processed. Startup latency = time to fill batch.</text>
<!-- The comparison with current design -->
<rect x="20" y="250" width="760" height="180" rx="10" fill="#161b22" stroke="#79c0ff" stroke-width="1.5"/>
<text x="400" y="272" fill="#79c0ff" font-size="11" text-anchor="middle" font-weight="700">Comparison with current design (merge + priority)</text>
<rect x="40" y="285" width="340" height="60" rx="6" fill="#0d1117" stroke="#f0883e" stroke-width="1"/>
<text x="210" y="303" fill="#f0883e" font-size="9" text-anchor="middle" font-weight="600">Current: merge with signal priority</text>
<text x="210" y="318" fill="#8b949e" font-size="8" text-anchor="middle">Startup latency: ~0 (audio plays immediately)</text>
<text x="210" y="333" fill="#8b949e" font-size="8" text-anchor="middle">Interrupt latency: bounded by hardware buffers</text>
<text x="210" y="338" fill="#8b949e" font-size="7" text-anchor="middle">(sounddevice buffer, inter-processor queue depth)</text>
<rect x="400" y="285" width="360" height="60" rx="6" fill="#0d1117" stroke="#3fb950" stroke-width="1"/>
<text x="580" y="303" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">Proposed: batched dispatcher</text>
<text x="580" y="318" fill="#8b949e" font-size="8" text-anchor="middle">Startup latency: bounded by batch interval</text>
<text x="580" y="333" fill="#8b949e" font-size="8" text-anchor="middle">Interrupt latency: bounded by batch interval</text>
<text x="580" y="338" fill="#8b949e" font-size="7" text-anchor="middle">(a parameter we control, not hardware)</text>
<!-- Key insight -->
<rect x="60" y="360" width="680" height="55" rx="6" fill="#79c0ff15" stroke="#79c0ff" stroke-width="1"/>
<text x="400" y="380" fill="#79c0ff" font-size="10" text-anchor="middle" font-weight="700">Both designs have latency bounded by a buffer.</text>
<text x="400" y="396" fill="#8b949e" font-size="9" text-anchor="middle">Current: bounded by nondeterministic hardware/OS buffers (sounddevice, asyncio queues).</text>
<text x="400" y="410" fill="#8b949e" font-size="9" text-anchor="middle">Proposed: bounded by a deterministic batch parameter we choose and record.</text>
</svg>
</div>
<h3>Fixed-count batching is the right primitive</h3>
<p>Of the three batch strategies, <strong>fixed-count</strong> is the only one that's fully deterministic. Time-bound batching reintroduces a timing dependency (how many packets arrived in 5ms varies by system load). Drain-queue depends on what's in the queue at the moment you drain (nondeterministic).</p>
<p>Fixed-count means: "process exactly N packets, resolve effects, execute, repeat." The same recording with the same N always produces the same batch boundaries, the same effect lists, the same cancellations. On any hardware.</p>
<table class="compare-table">
<tr>
<th>Batch strategy</th>
<th>Deterministic?</th>
<th>Startup latency</th>
<th>Interrupt latency</th>
<th>Cancellation</th>
</tr>
<tr>
<td>Fixed count = 1</td>
<td class="check">Yes</td>
<td class="check">~0</td>
<td class="cross">None (every effect fires immediately)</td>
<td class="cross">None</td>
</tr>
<tr>
<td><strong>Fixed count = N</strong></td>
<td class="check"><strong>Yes</strong></td>
<td class="maybe"><strong>Up to N packets</strong></td>
<td class="maybe"><strong>Up to N packets</strong></td>
<td class="check"><strong>Within batch</strong></td>
</tr>
<tr>
<td>Time-bound (e.g. 5ms)</td>
<td class="cross">No &mdash; packet count per window varies</td>
<td class="maybe">Up to 5ms</td>
<td class="maybe">Up to 5ms</td>
<td class="maybe">Partial</td>
</tr>
<tr>
<td>Drain queue</td>
<td class="cross">No &mdash; queue depth varies by timing</td>
<td class="cross">Unbounded</td>
<td class="check">~0</td>
<td class="check">Full</td>
</tr>
<tr>
<td>Current design (merge)</td>
<td class="cross">No</td>
<td class="check">~0</td>
<td class="maybe">Bounded by HW buffers</td>
<td class="maybe">Must drain speaker</td>
</tr>
</table>
<h3>How to pick the batch size</h3>
<p>The batch size determines the worst-case decision latency: how many packets can arrive before a cancellation takes effect. To target a specific latency, we need to know the packet arrival rate.</p>
<div class="diagram">
<svg width="800" height="320" viewBox="0 0 800 320">
<text x="400" y="22" fill="#79c0ff" font-size="11" text-anchor="middle" font-weight="700">Choosing a batch size for ~5ms decision latency</text>
<!-- The math -->
<rect x="50" y="40" width="700" height="130" rx="8" fill="#161b22" stroke="#79c0ff" stroke-width="1.5"/>
<text x="400" y="65" fill="#79c0ff" font-size="10" text-anchor="middle" font-weight="600">What arrives in 5ms?</text>
<text x="80" y="90" fill="#8b949e" font-size="9">Audio chunks (10ms each at 16kHz):</text>
<text x="500" y="90" fill="#c9d1d9" font-size="9">~0.5 packets per 5ms</text>
<text x="80" y="108" fill="#8b949e" font-size="9">TTS output (10ms chunks, continuous):</text>
<text x="500" y="108" fill="#c9d1d9" font-size="9">~0.5 packets per 5ms</text>
<text x="80" y="126" fill="#8b949e" font-size="9">LLM tokens (streaming, ~30 tokens/sec):</text>
<text x="500" y="126" fill="#c9d1d9" font-size="9">~0.15 packets per 5ms</text>
<text x="80" y="144" fill="#8b949e" font-size="9">Signals (sporadic):</text>
<text x="500" y="144" fill="#c9d1d9" font-size="9">~0 packets per 5ms</text>
<line x1="80" y1="152" x2="680" y2="152" stroke="#30363d" stroke-width="1"/>
<text x="80" y="167" fill="#3fb950" font-size="10" font-weight="600">Total: ~1-2 packets per 5ms in normal operation</text>
<!-- Recommendation -->
<rect x="50" y="185" width="700" height="120" rx="8" fill="#161b22" stroke="#3fb950" stroke-width="1.5"/>
<text x="400" y="210" fill="#3fb950" font-size="11" text-anchor="middle" font-weight="700">Recommended: batch_size = 5</text>
<text x="400" y="232" fill="#8b949e" font-size="9" text-anchor="middle">In normal operation (1-2 packets/5ms), batches are processed as fast as they fill.</text>
<text x="400" y="248" fill="#8b949e" font-size="9" text-anchor="middle">Under burst (TTS dumps 500 chunks), 100 batches of 5 = signal seen within 5 packets.</text>
<text x="400" y="264" fill="#8b949e" font-size="9" text-anchor="middle">Worst case: 4 audio chunks play before abort &mdash; that's 40ms of audio. Imperceptible.</text>
<text x="400" y="290" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">5 packets &times; ~1&micro;s/packet = 5&micro;s processing per batch. Near-zero overhead.</text>
<!-- Recording format -->
<rect x="150" y="305" width="500" height="15" rx="3" fill="#21262d"/>
<text x="400" y="316" fill="#f0883e" font-size="8" text-anchor="middle">Recording header: <tspan fill="#79c0ff">{batch_size: 5, ...}</tspan> &mdash; replay uses same value on any hardware</text>
</svg>
</div>
<div class="principle">
<p><strong>The recording contract:</strong> The batch size is recorded alongside the packet sequence. Replay feeds the same packets in the same batches. A faster machine processes each batch quicker, but the <em>contents</em> are identical. The wall clock is irrelevant during replay &mdash; packets come from the recording, not from a live queue. Any system, same results.</p>
</div>
<div class="insight">
<h3>Tuning is empirical, not theoretical</h3>
<p>The "right" batch size depends on actual packet rates in production, which depend on audio chunk sizes, TTS model speed, LLM token rate, and network conditions. Start with <code>batch_size = 5</code>, measure the interrupt latency and startup latency in real sessions, and adjust. The architecture doesn't change &mdash; only the number does.</p>
<p>If a deployment needs tighter interrupt response, reduce the batch size. If it needs better cancellation efficiency, increase it. Both are just config changes, not code changes. And both produce deterministic, replayable recordings.</p>
</div>
<!-- ============ TILL'S QUESTION: SIGNAL PRIORITY ============ -->
<h2>But what about 5 seconds of buffered audio?</h2>
<p>A natural concern: if the TTS worker has produced 5 seconds of <code>OutputAudio</code> chunks (say, 500 packets at 10ms each), and they're sitting in the dispatcher queue, does the <code>UserStartedSpeakingSignal</code> have to wait behind all 500 before the pipeline sees it?</p>
<p><strong>Yes, it does.</strong> And that's fine. Here's why:</p>
<div class="diagram">
<svg width="800" height="520" viewBox="0 0 800 520">
<!-- Title -->
<text x="400" y="22" fill="#f0883e" font-size="11" text-anchor="middle" font-weight="600">BARGE-IN WITH BUFFERED AUDIO: what actually happens</text>
<!-- The queue -->
<text x="400" y="50" fill="#8b949e" font-size="9" text-anchor="middle">Dispatcher queue at the moment user starts speaking:</text>
<!-- Audio packets -->
<rect x="15" y="62" width="38" height="28" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="34" y="80" fill="#3fb950" font-size="6" text-anchor="middle">Out1</text>
<rect x="57" y="62" width="38" height="28" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="76" y="80" fill="#3fb950" font-size="6" text-anchor="middle">Out2</text>
<rect x="99" y="62" width="38" height="28" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="118" y="80" fill="#3fb950" font-size="6" text-anchor="middle">Out3</text>
<text x="155" y="80" fill="#3fb950" font-size="10" text-anchor="middle">...</text>
<rect x="170" y="62" width="50" height="28" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="195" y="80" fill="#3fb950" font-size="6" text-anchor="middle">Out498</text>
<rect x="224" y="62" width="50" height="28" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="249" y="80" fill="#3fb950" font-size="6" text-anchor="middle">Out499</text>
<rect x="278" y="62" width="50" height="28" rx="3" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="303" y="80" fill="#3fb950" font-size="6" text-anchor="middle">Out500</text>
<!-- Signal at the end -->
<rect x="340" y="58" width="75" height="35" rx="4" fill="#21262d" stroke="#f85149" stroke-width="2"/>
<text x="377" y="74" fill="#f85149" font-size="7" text-anchor="middle" font-weight="600">UserStarted</text>
<text x="377" y="85" fill="#f85149" font-size="7" text-anchor="middle" font-weight="600">Speaking</text>
<!-- Queue direction -->
<line x1="15" y1="100" x2="415" y2="100" stroke="#f0883e" stroke-width="1" marker-end="url(#arr-orange)"/>
<text x="215" y="115" fill="#f0883e" font-size="8" text-anchor="middle">processing order &rarr;</text>
<!-- The key insight: TIME -->
<rect x="460" y="55" width="330" height="120" rx="8" fill="#161b22" stroke="#3fb950" stroke-width="1.5"/>
<text x="625" y="78" fill="#3fb950" font-size="11" text-anchor="middle" font-weight="700">Step functions are instant</text>
<text x="625" y="100" fill="#8b949e" font-size="9" text-anchor="middle">Processing 500 OutputAudio packets:</text>
<rect x="480" y="110" width="140" height="22" rx="3" fill="#21262d" stroke="#30363d" stroke-width="1"/>
<text x="550" y="125" fill="#8b949e" font-size="8" text-anchor="middle">Step fn: check state, emit effect</text>
<text x="660" y="125" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">~1 &micro;s each</text>
<text x="625" y="148" fill="#3fb950" font-size="10" text-anchor="middle" font-weight="700">500 packets &times; 1&micro;s = 0.5ms total</text>
<text x="625" y="165" fill="#8b949e" font-size="9" text-anchor="middle">The signal is processed &lt; 1ms after arriving</text>
<!-- Comparison with current design -->
<rect x="20" y="190" width="370" height="140" rx="8" fill="#161b22" stroke="#f85149" stroke-width="1.5"/>
<text x="205" y="212" fill="#f85149" font-size="10" text-anchor="middle" font-weight="700">Current design: merge with priority</text>
<text x="205" y="232" fill="#8b949e" font-size="9" text-anchor="middle">Signal gets priority in the stream.</text>
<text x="205" y="247" fill="#8b949e" font-size="9" text-anchor="middle">But the speaker is <tspan font-weight="600" fill="#f85149">actually playing</tspan> those 500 packets.</text>
<text x="205" y="262" fill="#8b949e" font-size="9" text-anchor="middle">Even with priority, you still need to stop the speaker.</text>
<text x="205" y="282" fill="#8b949e" font-size="9" text-anchor="middle">The abort latency is in the speaker thread, not the queue.</text>
<!-- Proposed -->
<rect x="410" y="190" width="380" height="140" rx="8" fill="#161b22" stroke="#3fb950" stroke-width="1.5"/>
<text x="600" y="212" fill="#3fb950" font-size="10" text-anchor="middle" font-weight="700">Proposed: FIFO dispatcher</text>
<text x="600" y="232" fill="#8b949e" font-size="9" text-anchor="middle">Signal waits behind 500 packets in the queue.</text>
<text x="600" y="247" fill="#8b949e" font-size="9" text-anchor="middle">Pipeline processes all 500 in &lt; 1ms (just logic).</text>
<text x="600" y="262" fill="#8b949e" font-size="9" text-anchor="middle">Hits the signal, emits <tspan fill="#d2a8ff">StopPlayback</tspan> effect.</text>
<text x="600" y="282" fill="#8b949e" font-size="9" text-anchor="middle">Effect executor aborts the speaker immediately.</text>
<text x="600" y="300" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">Same real-world latency. Fully replayable.</text>
<!-- Timeline comparison -->
<text x="400" y="355" fill="#79c0ff" font-size="10" text-anchor="middle" font-weight="600">Real-world timeline comparison</text>
<!-- Current design timeline -->
<text x="50" y="380" fill="#f85149" font-size="9" font-weight="600">Current:</text>
<line x1="110" y1="377" x2="750" y2="377" stroke="#30363d" stroke-width="1"/>
<rect x="110" y="367" width="3" height="20" fill="#f85149"/>
<text x="112" y="400" fill="#8b949e" font-size="7">signal arrives</text>
<rect x="115" y="370" width="30" height="14" rx="2" fill="#f8514933" stroke="#f85149" stroke-width="0.5"/>
<text x="130" y="380" fill="#f85149" font-size="6" text-anchor="middle">merge</text>
<rect x="150" y="370" width="80" height="14" rx="2" fill="#d2a8ff33" stroke="#d2a8ff" stroke-width="0.5"/>
<text x="190" y="380" fill="#d2a8ff" font-size="6" text-anchor="middle">abort + drain queue</text>
<rect x="235" y="370" width="60" height="14" rx="2" fill="#f8514933" stroke="#f85149" stroke-width="0.5"/>
<text x="265" y="380" fill="#f85149" font-size="6" text-anchor="middle">stop speaker</text>
<rect x="300" y="367" width="3" height="20" fill="#3fb950"/>
<text x="302" y="400" fill="#3fb950" font-size="7">silence</text>
<!-- Proposed design timeline -->
<text x="50" y="435" fill="#3fb950" font-size="9" font-weight="600">Proposed:</text>
<line x1="110" y1="432" x2="750" y2="432" stroke="#30363d" stroke-width="1"/>
<rect x="110" y="422" width="3" height="20" fill="#f85149"/>
<text x="112" y="455" fill="#8b949e" font-size="7">signal arrives</text>
<rect x="115" y="425" width="15" height="14" rx="2" fill="#3fb95033" stroke="#3fb950" stroke-width="0.5"/>
<text x="122" y="435" fill="#3fb950" font-size="5" text-anchor="middle">&lt;1ms</text>
<rect x="135" y="425" width="60" height="14" rx="2" fill="#f8514933" stroke="#f85149" stroke-width="0.5"/>
<text x="165" y="435" fill="#f85149" font-size="6" text-anchor="middle">stop speaker</text>
<rect x="200" y="422" width="3" height="20" fill="#3fb950"/>
<text x="202" y="455" fill="#3fb950" font-size="7">silence</text>
<!-- Annotation -->
<text x="500" y="435" fill="#8b949e" font-size="8">No queue draining needed &mdash; step functions already</text>
<text x="500" y="448" fill="#8b949e" font-size="8">processed the buffered packets (as logic, not audio)</text>
<!-- Bottom comparison -->
<rect x="150" y="475" width="500" height="35" rx="6" fill="#3fb95015" stroke="#3fb950" stroke-width="1"/>
<text x="400" y="492" fill="#3fb950" font-size="10" text-anchor="middle" font-weight="600">The bottleneck is always the speaker stop, never the queue processing.</text>
<text x="400" y="505" fill="#8b949e" font-size="9" text-anchor="middle">Both designs have the same real-world barge-in latency.</text>
</svg>
</div>
<div class="principle">
<p><strong>The insight:</strong> In the current design, audio packets in the queue represent <em>audio that will be played</em>. In the proposed design, audio packets in the queue represent <em>decisions to be made about audio</em>. Making 500 decisions takes &lt;1ms. Playing 500 audio chunks takes 5 seconds. The signal doesn't need priority because the queue is processed at decision speed, not playback speed.</p>
</div>
<p>The actual barge-in latency in both designs is dominated by the same thing: how fast the speaker thread can be stopped. That's an effect execution concern, not a pipeline concern. And it's the same code in both designs &mdash; set an abort event, drain the hardware buffer.</p>
<div class="insight">
<h3>The bottom line</h3>
<p>The current architecture has nondeterminism scattered through the pipeline (worker threads inside processors, merge interleaving). This makes replay and Hypothesis shrinking impossible for integration-level tests.</p>
<p>The proposed architecture pushes all nondeterminism to the edges. The pipeline becomes a pure function of its input sequence. You can record, replay, shrink, and property-test the entire system &mdash; not just individual step functions.</p>
<p>The step functions we've already built are the foundation. The missing piece is the Dispatcher that serializes all inputs into one deterministic stream.</p>
<p>Signal priority is not needed in the dispatcher because the pipeline processes packets in batches &mdash; it reads all available packets, runs all step functions, resolves effect cancellations (a <code>StopPlayback</code> cancels preceding <code>PlayAudio</code> effects), and only then executes the net effects. A signal behind 500 audio packets in the queue results in zero audio being played &mdash; the <code>StopPlayback</code> cancels all 500 <code>PlayAudio</code> effects before any reach the speaker.</p>
</div>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Superagent: Deterministic Pipeline Architecture</title>
<style>
* { box-sizing: border-box; margin: 0; padding: 0; }
body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', system-ui, sans-serif; background: #0d1117; color: #c9d1d9; line-height: 1.6; padding: 2rem; max-width: 1100px; margin: 0 auto; }
h1 { color: #58a6ff; font-size: 1.8rem; margin-bottom: 0.3rem; }
h2 { color: #79c0ff; font-size: 1.3rem; margin: 2.5rem 0 1rem; border-bottom: 1px solid #21262d; padding-bottom: 0.5rem; }
h3 { color: #d2a8ff; font-size: 1.1rem; margin: 1.5rem 0 0.75rem; }
p { color: #8b949e; margin-bottom: 0.75rem; }
.subtitle { color: #8b949e; font-size: 1rem; margin-bottom: 2rem; }
strong { color: #c9d1d9; }
code { background: #21262d; padding: 0.15rem 0.4rem; border-radius: 4px; font-size: 0.85rem; color: #79c0ff; }
ul { padding-left: 1.25rem; }
li { color: #8b949e; margin-bottom: 0.4rem; }
.principle { background: #161b22; border-left: 3px solid #58a6ff; padding: 1rem 1.5rem; margin: 1.5rem 0; border-radius: 0 8px 8px 0; }
.principle p { color: #c9d1d9; margin: 0; }
.diagram { background: #161b22; border: 1px solid #30363d; border-radius: 12px; padding: 1.5rem; margin: 1.5rem 0; overflow-x: auto; }
.testing { background: #161b22; border: 1px solid #3fb950; border-radius: 8px; padding: 1.5rem; margin: 1.5rem 0; }
.testing h4 { color: #3fb950; margin-top: 0; }
.file-ref { background: #161b22; border: 1px solid #30363d; border-radius: 8px; padding: 0.75rem 1.25rem; margin: 0.75rem 0; }
.file-ref .path { color: #79c0ff; font-family: monospace; font-size: 0.9rem; }
.file-ref .desc { color: #8b949e; font-size: 0.85rem; margin-top: 0.15rem; }
.code-block { background: #161b22; border: 1px solid #30363d; border-radius: 8px; padding: 1rem 1.5rem; margin: 1rem 0; overflow-x: auto; }
.code-block pre { color: #c9d1d9; font-family: 'SF Mono', monospace; font-size: 0.85rem; line-height: 1.5; white-space: pre; }
.changelog { background: #161b22; border: 1px solid #f0883e; border-radius: 8px; padding: 1.5rem; margin: 2rem 0; }
.changelog h3 { color: #f0883e; margin-top: 0; }
svg text { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', system-ui, sans-serif; }
.toc { background: #161b22; border: 1px solid #30363d; border-radius: 8px; padding: 1.5rem; margin: 1.5rem 0; }
.toc h3 { margin-top: 0; color: #79c0ff; }
.toc a { display: block; padding: 0.25rem 0; font-size: 0.9rem; color: #58a6ff; text-decoration: none; }
.toc a:hover { text-decoration: underline; }
.num { color: #d2a8ff; font-weight: 700; margin-right: 0.5rem; }
.badge { display: inline-block; font-size: 0.7rem; font-weight: 600; padding: 0.1rem 0.5rem; border-radius: 10px; margin-left: 0.5rem; }
.badge.det { background: #3fb95022; color: #3fb950; border: 1px solid #3fb95055; }
.badge.nondet { background: #f8514922; color: #f85149; border: 1px solid #f8514955; }
.badge.boundary { background: #f0883e22; color: #f0883e; border: 1px solid #f0883e55; }
table { border-collapse: collapse; width: 100%; margin: 1rem 0; }
th, td { border: 1px solid #30363d; padding: 0.5rem 0.75rem; text-align: left; font-size: 0.85rem; }
th { background: #161b22; color: #79c0ff; font-weight: 600; }
td { color: #8b949e; }
</style>
</head>
<body>
<h1>Deterministic Pipeline Architecture</h1>
<p class="subtitle">Full replayability through a two-lane dispatcher, pure step functions with turn tracking, and single-packet ticks. 455 tests. 3 bugs found by auto-generated injection framework.</p>
<div class="toc">
<h3>Contents</h3>
<a href="#overview"><span class="num">1</span> System Overview</a>
<a href="#dispatcher"><span class="num">2</span> The Two-Lane Dispatcher</a>
<a href="#recording"><span class="num">3</span> Recording &amp; Replay</a>
<a href="#tick-runner"><span class="num">4</span> The Tick Runner</a>
<a href="#step-functions"><span class="num">5</span> Step Functions</a>
<a href="#turn-numbers"><span class="num">6</span> Turn Numbers</a>
<a href="#error-handling"><span class="num">7</span> Error Handling</a>
<a href="#effect-executor"><span class="num">8</span> The Effect Executor</a>
<a href="#testing"><span class="num">9</span> Testing Framework</a>
<a href="#developer-guide"><span class="num">10</span> Developer Guide</a>
<a href="#code-map"><span class="num">11</span> Code Map</a>
<a href="#changelog"><span class="num">12</span> Changelog</a>
</div>
<!-- ===== 1. OVERVIEW ===== -->
<h2 id="overview">1. System Overview</h2>
<p>The system has three zones. Everything inside the <strong>deterministic zone</strong> is a pure function of its inputs. The <strong>recording point</strong> sits at the output of the dispatcher &mdash; record there, replay anywhere, get identical behavior.</p>
<div class="diagram">
<svg width="1040" height="270" viewBox="0 0 1040 270">
<defs>
<marker id="ag" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto"><path d="M0 0L10 5L0 10z" fill="#3fb950"/></marker>
<marker id="ar" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto"><path d="M0 0L10 5L0 10z" fill="#f85149"/></marker>
<marker id="ap" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto"><path d="M0 0L10 5L0 10z" fill="#d2a8ff"/></marker>
</defs>
<!-- LEFT: Nondeterministic inputs -->
<rect x="5" y="8" width="180" height="210" rx="10" fill="none" stroke="#f85149" stroke-width="1" stroke-dasharray="5,3"/>
<text x="95" y="25" fill="#f85149" font-size="9" text-anchor="middle" font-weight="600">NONDETERMINISTIC</text>
<rect x="14" y="38" width="162" height="24" rx="4" fill="#21262d" stroke="#f85149" stroke-width="1"/><text x="95" y="54" fill="#f85149" font-size="8" text-anchor="middle">WebRTC audio</text>
<rect x="14" y="68" width="162" height="24" rx="4" fill="#21262d" stroke="#f85149" stroke-width="1"/><text x="95" y="84" fill="#f85149" font-size="8" text-anchor="middle">TTS worker output</text>
<rect x="14" y="98" width="162" height="24" rx="4" fill="#21262d" stroke="#f85149" stroke-width="1"/><text x="95" y="114" fill="#f85149" font-size="8" text-anchor="middle">STT worker output</text>
<rect x="14" y="128" width="162" height="24" rx="4" fill="#21262d" stroke="#f85149" stroke-width="1"/><text x="95" y="144" fill="#f85149" font-size="8" text-anchor="middle">VAD / barge-in signals</text>
<rect x="14" y="158" width="162" height="24" rx="4" fill="#21262d" stroke="#f85149" stroke-width="1"/><text x="95" y="174" fill="#f85149" font-size="8" text-anchor="middle">LLM token stream</text>
<line x1="190" y1="110" x2="212" y2="110" stroke="#f85149" stroke-width="1.5" marker-end="url(#ar)"/>
<!-- DISPATCHER -->
<rect x="217" y="33" width="118" height="160" rx="8" fill="#21262d" stroke="#f0883e" stroke-width="2.5"/>
<text x="276" y="52" fill="#f0883e" font-size="10" text-anchor="middle" font-weight="700">Dispatcher</text>
<rect x="232" y="62" width="86" height="26" rx="4" fill="#0d1117" stroke="#f85149" stroke-width="1"/>
<text x="275" y="79" fill="#f85149" font-size="8" text-anchor="middle">Signal Lane</text>
<rect x="232" y="96" width="86" height="26" rx="4" fill="#0d1117" stroke="#58a6ff" stroke-width="1"/>
<text x="275" y="113" fill="#58a6ff" font-size="8" text-anchor="middle">Data Lane</text>
<text x="276" y="143" fill="#f0883e" font-size="8" text-anchor="middle">signals first</text>
<text x="276" y="156" fill="#f0883e" font-size="8" text-anchor="middle">then data</text>
<!-- RECORD POINT -->
<line x1="340" y1="110" x2="450" y2="110" stroke="#3fb950" stroke-width="2.5" marker-end="url(#ag)"/>
<circle cx="395" cy="110" r="9" fill="#f0883e" stroke="#0d1117" stroke-width="2"/>
<text x="395" y="113" fill="#0d1117" font-size="8" text-anchor="middle" font-weight="700">R</text>
<text x="395" y="94" fill="#f0883e" font-size="7" text-anchor="middle" font-weight="600">RECORD / REPLAY</text>
<text x="395" y="130" fill="#f0883e" font-size="6" text-anchor="middle">one packet at a time</text>
<!-- DETERMINISTIC ZONE -->
<rect x="455" y="8" width="220" height="145" rx="10" fill="none" stroke="#3fb950" stroke-width="1.5"/>
<text x="565" y="25" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">DETERMINISTIC ZONE</text>
<rect x="470" y="38" width="190" height="28" rx="5" fill="#0d1117" stroke="#3fb950" stroke-width="1.5"/>
<text x="565" y="56" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">Tick Runner (1 packet/tick)</text>
<line x1="565" y1="70" x2="565" y2="82" stroke="#3fb950" stroke-width="1" marker-end="url(#ag)"/>
<rect x="472" y="86" width="44" height="24" rx="3" fill="#21262d" stroke="#d2a8ff" stroke-width="1"/><text x="494" y="102" fill="#d2a8ff" font-size="8" text-anchor="middle">STT</text>
<rect x="520" y="86" width="44" height="24" rx="3" fill="#21262d" stroke="#d2a8ff" stroke-width="1"/><text x="542" y="102" fill="#d2a8ff" font-size="8" text-anchor="middle">Split</text>
<rect x="568" y="86" width="44" height="24" rx="3" fill="#21262d" stroke="#d2a8ff" stroke-width="1"/><text x="590" y="102" fill="#d2a8ff" font-size="8" text-anchor="middle">TTS</text>
<rect x="616" y="86" width="44" height="24" rx="3" fill="#21262d" stroke="#d2a8ff" stroke-width="1"/><text x="638" y="102" fill="#d2a8ff" font-size="8" text-anchor="middle">Spkr</text>
<text x="565" y="135" fill="#d2a8ff" font-size="8" text-anchor="middle">effects &rarr;</text>
<!-- Arrow: effects to executor -->
<line x1="680" y1="98" x2="730" y2="98" stroke="#d2a8ff" stroke-width="1.5" marker-end="url(#ap)"/>
<!-- EFFECT EXECUTOR -->
<rect x="735" y="8" width="270" height="200" rx="10" fill="none" stroke="#f85149" stroke-width="1" stroke-dasharray="5,3"/>
<text x="870" y="25" fill="#f85149" font-size="9" text-anchor="middle" font-weight="600">NONDETERMINISTIC</text>
<rect x="748" y="38" width="244" height="32" rx="6" fill="#21262d" stroke="#d2a8ff" stroke-width="1.5"/>
<text x="870" y="53" fill="#d2a8ff" font-size="10" text-anchor="middle" font-weight="600">Effect Executor</text>
<text x="870" y="64" fill="#8b949e" font-size="7" text-anchor="middle">fire-and-forget &mdash; fully parallelizable</text>
<rect x="748" y="82" width="118" height="24" rx="4" fill="#21262d" stroke="#d2a8ff" stroke-width="1"/><text x="807" y="98" fill="#8b949e" font-size="8" text-anchor="middle">Start TTS</text>
<rect x="874" y="82" width="118" height="24" rx="4" fill="#21262d" stroke="#d2a8ff" stroke-width="1"/><text x="933" y="98" fill="#8b949e" font-size="8" text-anchor="middle">Stop speaker</text>
<rect x="748" y="114" width="118" height="24" rx="4" fill="#21262d" stroke="#d2a8ff" stroke-width="1"/><text x="807" y="130" fill="#8b949e" font-size="8" text-anchor="middle">Send to browser</text>
<rect x="874" y="114" width="118" height="24" rx="4" fill="#21262d" stroke="#d2a8ff" stroke-width="1"/><text x="933" y="130" fill="#8b949e" font-size="8" text-anchor="middle">Run STT model</text>
<text x="870" y="158" fill="#f85149" font-size="8" text-anchor="middle" font-weight="600">All run in parallel</text>
<text x="870" y="170" fill="#8b949e" font-size="7" text-anchor="middle">threads, tasks, workers</text>
<text x="870" y="182" fill="#8b949e" font-size="7" text-anchor="middle">order doesn't matter</text>
<!-- Feedback loop -->
<path d="M870 195 L870 240 L276 240 L276 198" fill="none" stroke="#f85149" stroke-width="1" stroke-dasharray="4,3" marker-end="url(#ar)"/>
<text x="573" y="255" fill="#f85149" font-size="8" text-anchor="middle">results feed back to dispatcher &rarr; next tick</text>
</svg>
</div>
<p>Three zones: <strong>nondeterministic inputs</strong> (mic, TTS workers, STT results, VAD signals), the <strong>deterministic zone</strong> (tick runner + pure step functions), and <strong>nondeterministic outputs</strong> (effect executor spawning background tasks). The recording point captures the dispatcher's merged output. Replay feeds the same flat packet stream back through the deterministic zone and gets identical behavior.</p>
<!-- ===== 2. DISPATCHER ===== -->
<h2 id="dispatcher">2. The Two-Lane Dispatcher <span class="badge boundary">boundary</span></h2>
<p>The dispatcher has <strong>two FIFO queues</strong>: one for signals, one for data. On each read, it drains all pending signals first, then one data packet. Signals are never blocked behind data.</p>
<div class="principle">
<p><strong>Why two lanes?</strong> TTS produces audio faster than realtime &mdash; 500 packets in ~30ms. A single FIFO would bury a barge-in signal behind all 500. Two lanes ensure the signal is seen immediately, regardless of data queue depth.</p>
</div>
<p><strong>Determinism:</strong> the merge rule is fixed &mdash; signals before data, FIFO within each lane. Given the same signal sequence and the same data sequence, the output is always identical. No timing dependency.</p>
<div class="code-block"><pre>read_batch(1):
1. Drain ALL pending signals &rarr; [UserStartedSpeaking]
2. Read 1 data packet &rarr; [OutAudio]
3. Return combined &rarr; [signal, data]</pre></div>
<div class="file-ref">
<div class="path">src/superagent/dispatcher.py</div>
<div class="desc">Two-lane Dispatcher, BatchSource protocol, thread-safe push.</div>
</div>
<!-- ===== 3. RECORDING ===== -->
<h2 id="recording">3. Recording &amp; Replay <span class="badge boundary">boundary</span></h2>
<p>Recording captures the <strong>output</strong> of the dispatcher &mdash; the merged sequence of signals + data, one packet at a time. The recording is a flat list of packets in the order they were processed. No batches, no nesting.</p>
<div class="code-block"><pre>Recording format (flat packet list):
packet 0: UserStartedSpeakingSignal
packet 1: InputAudio
packet 2: InputAudio
packet 3: EndFrame(of=InputAudio)
packet 4: TextFrame(text="Hello world.", turn=1)
packet 5: EndFrame(of=TextFrame, turn=1)
packet 6: OutputAudio(turn=1)
...
Replay: feed each packet directly to the TickHarness.
No dispatcher involved. Same inputs &rarr; same outputs.</pre></div>
<p><code>RecordingDispatcher</code> wraps the live Dispatcher and captures each packet. On replay, the recorded packets feed directly to the tick harness (step functions), bypassing the dispatcher entirely. The two-lane merge already happened at recording time &mdash; replay just processes the resulting flat stream.</p>
<div class="file-ref">
<div class="path">src/superagent/recording.py</div>
<div class="desc">Recording dataclass, RecordingDispatcher, ReplayDispatcher.</div>
</div>
<!-- ===== 4. TICK RUNNER ===== -->
<h2 id="tick-runner">4. The Tick Runner <span class="badge det">deterministic</span></h2>
<p>The core loop. Synchronous. Each tick: read one packet from the dispatcher, process it through the step function chain, fire-and-forget the resulting effects.</p>
<div class="code-block"><pre>def _tick(packet):
effects = _process_packet(packet) # pure, synchronous
executor.execute(effects) # spawns background tasks, returns immediately</pre></div>
<p><strong>Signal broadcast:</strong> signals are delivered to <strong>ALL</strong> stages. Every step function sees every signal &mdash; <code>UserStartedSpeakingSignal</code> broadcasts to STT, splitter, TTS, and speaker simultaneously. This is what triggers coordinated barge-in: TTS aborts synthesis while speaker stops playback, all from the same signal on the same tick.</p>
<p><strong>Data flow:</strong> data packets flow linearly. Each stage's output packets become the next stage's input. Effects are collected but not forwarded. One packet in, effects out. No batching, no buffering.</p>
<div class="file-ref">
<div class="path">src/superagent/batch_runner.py</div>
<div class="desc">BatchRunner (tick runner), StageConfig, _Stage, signal broadcast, single-pass processing.</div>
</div>
<!-- ===== 5. STEP FUNCTIONS ===== -->
<h2 id="step-functions">5. Step Functions <span class="badge det">deterministic</span></h2>
<p>Every processor's decision logic is a pure function: <code>(state, packet) &rarr; (new_state, [Packet | Effect])</code>. No I/O, no async, no side effects. State is a frozen dataclass. Each state carries a <code>turn</code> counter that increments on <code>UserStartedSpeakingSignal</code>. Effects are stamped with the current turn so handlers can propagate it to result packets.</p>
<div class="code-block"><pre>@dataclass(frozen=True)
class ParakeetState:
frames: tuple = ()
gated: bool = False
recording: bool = False
turn: int = 0 # increments on UserStartedSpeakingSignal
def parakeet_step(state, packet) -&gt; (new_state, outputs):
...
return (ParakeetState(..., turn=state.turn + 1), [Transcribe(..., turn=state.turn)])</pre></div>
<div class="file-ref"><div class="path">src/superagent/stt/parakeet_step.py</div><div class="desc">STT decisions. Gated (push-to-talk) + ungated modes. Turn tracking. Emits Transcribe.</div></div>
<div class="file-ref"><div class="path">src/superagent/text/splitter.py</div><div class="desc">Sentence splitting. Propagates turn from incoming TextFrame to emitted Utterances via buffer_turn.</div></div>
<div class="file-ref"><div class="path">src/superagent/tts/kokoro_step.py</div><div class="desc">TTS decisions. Two-phase (idle &rarr; synthesizing). Turn tracking. Discards stale packets. Emits StartSynthesis, FeedText, CloseInput, Abort.</div></div>
<div class="file-ref"><div class="path">src/superagent/audio/speaker_step.py</div><div class="desc">Speaker decisions. Barge-in support. Turn tracking. Discards stale packets. Emits PlayAudio, InitSpeaker, StopPlayback.</div></div>
<div class="file-ref"><div class="path">src/superagent/audio/sink_step.py</div><div class="desc">WAV file sink. Format mismatch detection. Emits WriteAudio, InitWav.</div></div>
<div class="file-ref"><div class="path">src/superagent/print_step.py</div><div class="desc">Debug printer. Emits PrintEffect.</div></div>
<!-- ===== 6. TURN NUMBERS ===== -->
<h2 id="turn-numbers">6. Turn Numbers <span class="badge det">deterministic</span></h2>
<p><strong>The problem:</strong> effects run in the background. When a Transcribe effect is in flight and the user barges in, the handler returns a TextFrame for the <em>old</em> turn. Without turn numbers, TTS would start synthesizing a stale transcript while the user is already speaking again.</p>
<div class="principle">
<p><strong>Core mechanism:</strong> each step function state has a <code>turn: int</code>. Only <code>UserStartedSpeakingSignal</code> increments it (during signal broadcast, so all stages increment in lockstep). Effects are stamped with the turn that created them. Handlers copy the turn to their result packets. Step functions compare <code>packet.turn</code> to <code>state.turn</code> and silently discard stale packets.</p>
</div>
<h3>What increments the turn</h3>
<p>Only <code>UserStartedSpeakingSignal</code>. This signal broadcasts to all stages on the same tick, so STT, TTS, and speaker all advance to the same turn number simultaneously.</p>
<h3>What happens to stale packets</h3>
<p>When a packet arrives with <code>packet.turn &lt; state.turn</code>, the step function returns <code>(state, [])</code> &mdash; silently discarded. No effects produced, no state change. The splitter propagates turn through its <code>buffer_turn</code> field so Utterances carry the correct turn.</p>
<h3>Propagation chain</h3>
<div class="code-block"><pre>UserStartedSpeakingSignal
&rarr; ParakeetState.turn++ &rarr; Transcribe(turn=N) &rarr; handler &rarr; TextFrame(turn=N)
&rarr; SplitterState.buffer_turn=N &rarr; Utterance(turn=N)
&rarr; KokoroState checks turn &rarr; StartSynthesis(turn=N) &rarr; handler &rarr; OutputAudio(turn=N)
&rarr; SpeakerState checks turn &rarr; PlayAudio(turn=N)</pre></div>
<h3>Bugs found during development</h3>
<p>Turn numbers exposed three classes of bugs. All were found by Hypothesis property tests before any manual testing could reach them.</p>
<ul>
<li><strong>ParakeetState dropping turn:</strong> the initial implementation did not carry the turn field through all state transitions. The Transcribe effect was stamped with turn 0 regardless of the actual turn. The stateful test's <code>turn_numbers_consistent</code> invariant caught this immediately.</li>
<li><strong>EndFrame missing turn:</strong> <code>EndFrame(of=TextFrame)</code> was emitted without a turn stamp. Stale EndFrames could trigger CloseInput in TTS after a barge-in. Found by code review after the stateful test caught the broader pattern.</li>
<li><strong>Splitter not propagating turn:</strong> Utterances were emitted without the incoming TextFrame's turn. Downstream stages (TTS, speaker) had no turn to check, so they accepted stale data. Found by the stateful test.</li>
</ul>
<!-- ===== 7. ERROR HANDLING ===== -->
<h2 id="error-handling">7. Error Handling</h2>
<p><code>FailedEffect</code> is a <strong>Signal</strong>. When an effect handler raises an exception, the executor pushes a <code>FailedEffect</code> to the dispatcher. Because it is a Signal, it gets two-lane priority (never buried behind data) and broadcasts to all stages on the next tick.</p>
<div class="code-block"><pre>@dataclass(frozen=True)
class FailedEffect(Signal):
effect_type: type # e.g. StartSynthesis, Transcribe, InitSpeaker
error: str
turn: int = 0 # turn that originated the failed effect</pre></div>
<h3>Step function recovery</h3>
<p>Step functions reset from active states to idle when they see a FailedEffect for their effect type. For example, KokoroState resets <code>synthesizing</code> to <code>False</code> on <code>FailedEffect(effect_type=StartSynthesis)</code>. SpeakerState resets <code>playing</code> to <code>False</code> on <code>FailedEffect(effect_type=InitSpeaker)</code>. The pipeline continues processing subsequent turns.</p>
<h3>Stale FailedEffect handling</h3>
<p>FailedEffect carries a turn number. If a FailedEffect arrives from a previous turn (<code>effect.turn &lt; state.turn</code>), the step function ignores it. This prevents a stale failure from killing an active synthesis in the current turn. This specific bug was found by the auto-generated injection framework (see section 9).</p>
<h3>Where retry/backoff lives</h3>
<p>In the handlers, not the step functions. Step functions are pure and see only success (result packets fed back to dispatcher) or final failure (FailedEffect broadcast). Handlers own the retry policy &mdash; exponential backoff, circuit breakers, etc. If all retries fail, the handler raises and the executor creates the FailedEffect.</p>
<div class="file-ref"><div class="path">src/superagent/packets.py</div><div class="desc">FailedEffect signal definition, effect_type + error + turn fields.</div></div>
<!-- ===== 8. EFFECT EXECUTOR ===== -->
<h2 id="effect-executor">8. The Effect Executor <span class="badge nondet">nondeterministic</span></h2>
<p><strong>Fire-and-forget.</strong> <code>execute()</code> is synchronous &mdash; it calls <code>asyncio.create_task()</code> for each effect and returns immediately. The tick loop is never blocked by slow I/O (network calls, model inference, device setup). This preserves signal priority in the dispatcher.</p>
<p>Handlers are <strong>classes with state</strong>. <code>TTSHandler</code> manages a <code>WorkerThread</code> for synthesis. <code>SpeakerHandler</code> manages a playback thread and audio queue. <code>TranscribeHandler</code> runs STT inference in a thread pool. Each handler receives its dispatcher reference to push result packets back.</p>
<p><code>register_many()</code> maps multiple effect types to one handler. The TTS handler receives <code>StartSynthesis</code>, <code>FeedText</code>, <code>CloseInput</code>, and <code>Abort</code> &mdash; four effect types, one stateful handler that manages the worker lifecycle across them all.</p>
<p>When a handler raises, the executor catches the exception and pushes <code>FailedEffect</code> to the dispatcher. The pipeline continues.</p>
<div class="file-ref"><div class="path">src/superagent/effect_executor.py</div><div class="desc">EffectExecutor with create_task dispatch, register/register_many, FailedEffect on exception.</div></div>
<div class="file-ref"><div class="path">src/superagent/handlers/tts.py</div><div class="desc">TTSHandler &mdash; manages WorkerThread for Kokoro synthesis across StartSynthesis/FeedText/CloseInput/Abort.</div></div>
<div class="file-ref"><div class="path">src/superagent/handlers/transcribe.py</div><div class="desc">TranscribeHandler &mdash; runs Parakeet STT inference in thread pool executor.</div></div>
<div class="file-ref"><div class="path">src/superagent/handlers/speaker.py</div><div class="desc">SpeakerHandler &mdash; manages playback thread for InitSpeaker/PlayAudio/StopPlayback.</div></div>
<!-- ===== 9. TESTING FRAMEWORK ===== -->
<h2 id="testing">9. Testing Framework (455 tests)</h2>
<p>Tests are organized from the inside out. Each layer tests one thing in isolation, building confidence from pure functions outward to the nondeterministic boundary. The test suite uses <a href="https://hypothesis.readthedocs.io/">Hypothesis</a> for property-based testing throughout.</p>
<h3>Layer 1: Step function property tests (~80 tests) <span class="badge det">deterministic</span></h3>
<div class="testing">
<h4>What they test</h4>
<p>Per-function invariants: no text lost (splitter), signals always absorbed (never forwarded), state type validity after any input, barge-in lifecycle (synthesizing &rarr; Abort + idle), gated mode logic, format mismatch detection, turn tracking consistency.</p>
<p>Hypothesis generates random packets and verifies the invariant holds for all of them. This is not example-based testing &mdash; it explores the input space systematically.</p>
<p><strong>Files:</strong> <code>test_parakeet_step.py</code>, <code>test_splitter.py</code>, <code>test_kokoro_step.py</code>, <code>test_speaker_step.py</code>, <code>test_audio_sink_step.py</code>, <code>test_print_step.py</code></p>
</div>
<h3>Layer 2: Boundary tests (23 tests) <span class="badge boundary">boundary</span></h3>
<div class="testing">
<h4>What they test</h4>
<p>Adversarial inputs against each step function: empty samples, null bytes, NaN timestamps, odd byte counts (violating int16 alignment), absurd sample rates. Known good state + bad data &rarr; no crash, no corruption.</p>
<p><strong>File:</strong> <code>test_boundary.py</code></p>
</div>
<h3>Layer 3: Dispatcher property tests (9 tests) <span class="badge det">deterministic</span></h3>
<div class="testing">
<h4>What they test</h4>
<p>Signals always before data. FIFO within each lane. No packets lost. Empty after drain.</p>
<p><strong>File:</strong> <code>test_dispatcher.py</code></p>
</div>
<h3>Layer 4: TickHarness &mdash; deterministic integration (24 tests) <span class="badge det">deterministic</span></h3>
<div class="testing">
<h4>What it is</h4>
<p>A synchronous test harness that processes a flat packet stream directly through the step function chain. <strong>No dispatcher, no executor.</strong> This is exactly what replay looks like &mdash; a list of packets processed one at a time through the stages, collecting effects without executing them.</p>
<p><strong>Why bypass the dispatcher?</strong> Because the recording captures the dispatcher's output. Testing should exercise the deterministic zone in isolation. The TickHarness processes the same flat stream that replay would, verifying that step function composition produces correct effects for any generated session.</p>
<p><strong>File:</strong> <code>test_deterministic_integration.py</code></p>
</div>
<h3>Layer 5: Session generators <span class="badge det">deterministic</span></h3>
<div class="testing">
<h4>What they are</h4>
<p>Composable Hypothesis strategies that produce flat packet streams modeling what the dispatcher would emit. Each strategy represents a different scenario:</p>
<ul>
<li><code>successful_turn()</code> &mdash; user speaks, STT returns transcript, TTS produces audio</li>
<li><code>failed_stt_turn()</code> &mdash; user speaks, STT fails (FailedEffect instead of transcript)</li>
<li><code>failed_tts_turn()</code> &mdash; transcript arrives, TTS fails</li>
<li><code>failed_speaker_turn()</code> &mdash; speaker device fails mid-playback</li>
<li><code>barge_in_turn()</code> &mdash; agent responding, user interrupts, new turn begins</li>
<li><code>full_loop_sessions()</code> &mdash; 1-5 turns, random mix of the above</li>
<li><code>interleaved_session()</code> &mdash; concurrent sources interleaved (models real dispatcher behavior)</li>
<li><code>full_loop_with_adversarial()</code> &mdash; realistic turns + injected bad packets + random failures</li>
</ul>
<p><strong>File:</strong> <code>tests/strategies.py</code></p>
</div>
<h3>Layer 6: Rule-based stateful test <span class="badge det">deterministic</span></h3>
<div class="testing">
<h4>What it is</h4>
<p>Models the conversation as a state machine: <code>IDLE &rarr; RECORDING &rarr; WAITING_FOR_STT &rarr; AGENT_SPEAKING &rarr; IDLE</code>. Hypothesis explores random paths through this state machine, choosing rules at each step. Each rule maps to a real action (user starts speaking, audio chunks arrive, STT succeeds/fails, TTS output arrives, barge-in, etc.).</p>
<h4>How it works</h4>
<p>200 Hypothesis paths, each up to 40 steps. At each step Hypothesis chooses a rule. Rules have preconditions (e.g., "only send audio if RECORDING"). Invariants are checked after every step:</p>
<ul>
<li>Turn numbers consistent across all stages</li>
<li>No stale packets trigger effects</li>
<li>STT in gated mode when recording</li>
<li>Effects list contains no signals</li>
<li>All state types valid</li>
</ul>
<h4>Stale packet injection</h4>
<p>Dedicated rules inject stale transcripts and stale OutputAudio from previous turns at arbitrary points. The stateful test found three turn-propagation bugs before manual testing could reach them.</p>
<h4>Regression dump on failure</h4>
<p>When an assertion fails, the test's <code>teardown()</code> writes the full packet trace to <code>tests/regressions/</code> with a descriptive filename derived from the assertion message (e.g., <code>stale_failedeffect_startsynthesis_from_turn_0_killed_synthesis_in_turn_1.py</code>). These files are automatically picked up by the regression harness.</p>
<p><strong>File:</strong> <code>test_stateful.py</code></p>
</div>
<h3>Layer 7: Auto-generated injection framework <span class="badge det">deterministic</span></h3>
<div class="testing">
<h4>The key testing innovation</h4>
<p>This framework auto-generates guarded injection rules by inspecting step function state dataclasses. When you add a new step function, you get injection coverage for free &mdash; no manual test code needed.</p>
<h4>The insight: precondition injections on active state</h4>
<p>Step functions can be in "active" states (synthesizing, playing, recording, buffering). Various classes of unexpected packets can arrive while a step function is active. The question is: does the active state survive the injection?</p>
<p>The framework discovers active conditions automatically by inspecting state dataclasses. Any field whose current value differs from its default means the step function is "active."</p>
<h4>How it works</h4>
<ol>
<li><strong>Discover active conditions:</strong> inspect each state dataclass's fields. For each field with a default value, record the (stage, field, default) triple. Example: <code>KokoroState.synthesizing</code> has default <code>False</code>. When it's <code>True</code>, TTS is active.</li>
<li><strong>Cross with injection classes:</strong> for each active condition, generate one rule per injection class. Each rule has a precondition: "this field is non-default." The rule injects a bad packet and checks the result.</li>
<li><strong>Hypothesis explores naturally:</strong> the conversation flow rules (same as the stateful test) drive the system to active states. When an active condition is reached, the injection rule fires automatically.</li>
</ol>
<h4>Why split compound rules</h4>
<p>Early versions used one big rule per injection class ("inject stale FailedEffect whenever any field is active"). This was too coarse &mdash; Hypothesis couldn't tell which field was affected, and shrinking was poor. Splitting to one rule per (field, injection class) gives Hypothesis precise targets: it can explore each active condition independently and produce minimal failing examples.</p>
<h4>The INJECTION_CLASSES table</h4>
<table>
<tr><th>Class</th><th>What it injects</th><th>Strict?</th><th>Assertion</th></tr>
<tr><td><code>stale_failed_effect</code></td><td>FailedEffect from turn N-1</td><td>Yes</td><td>Field unchanged</td></tr>
<tr><td><code>stale_transcript</code></td><td>TextFrame from turn N-1</td><td>Yes</td><td>Field unchanged</td></tr>
<tr><td><code>stale_output_audio</code></td><td>OutputAudio from turn N-1</td><td>Yes</td><td>Field unchanged</td></tr>
<tr><td><code>unexpected_input_audio</code></td><td>InputAudio during active state</td><td>No</td><td>State still valid (STT legitimately accumulates)</td></tr>
<tr><td><code>unexpected_end_frame</code></td><td>EndFrame arriving unexpectedly</td><td>No</td><td>State still valid (may trigger legitimate transitions)</td></tr>
</table>
<p><strong>Strict</strong> means the field must not change (for stale/disruptive packets). <strong>Non-strict</strong> means the state must still be a valid type (for packets that may legitimately be accepted, like InputAudio by STT).</p>
<h4>Bugs found by the injection framework</h4>
<ol>
<li><strong>STT accumulating OutputAudio:</strong> <code>ParakeetState</code> was accepting OutputAudio frames in its <code>frames</code> tuple during active recording. The <code>unexpected_input_audio</code> injection revealed that non-InputAudio frames were being accumulated when they shouldn't be.</li>
<li><strong>Splitter accepting stale text:</strong> the splitter was processing TextFrame packets without checking the turn number. Stale transcripts from previous turns were being split and forwarded as Utterances. The <code>stale_transcript</code> injection caught this.</li>
<li><strong>Stale FailedEffect killing synthesis:</strong> a <code>FailedEffect(effect_type=StartSynthesis, turn=0)</code> arriving during turn 1 was resetting <code>KokoroState.synthesizing</code> to <code>False</code>, killing an active synthesis in the current turn. The <code>stale_failed_effect</code> injection caught this. Two regression files were auto-generated.</li>
</ol>
<h4>What's automatic vs manual</h4>
<ul>
<li><strong>Automatic:</strong> adding a step function to <code>STAGE_CONFIGS</code> generates all injection rules for all its fields. Regression files are auto-generated on failure with descriptive filenames. No manual test code needed for injection coverage.</li>
<li><strong>Manual:</strong> adding a new injection class (a new type of bad packet to inject). Add a tuple to <code>INJECTION_CLASSES</code> with a name, description, packet factory, and strict flag.</li>
</ul>
<h4>Example: how adding a new step function gets coverage for free</h4>
<div class="code-block"><pre># In test_stale_injection.py, just add to STAGE_CONFIGS:
STAGE_CONFIGS = [
StageConfig("stt", parakeet_step, ParakeetState()),
StageConfig("splitter", splitter_step, SplitterState()),
StageConfig("tts", ..., KokoroState()),
StageConfig("speaker", ..., SpeakerState()),
StageConfig("my_new_step", my_step_fn, MyState()), # &larr; add this
]
# MyState has fields like:
@dataclass(frozen=True)
class MyState:
processing: bool = False # active condition discovered automatically
buffer: tuple = () # active condition discovered automatically
turn: int = 0 # skipped (in _SKIP_FIELDS)
# Result: 5 injection classes &times; 2 active fields = 10 new rules generated.
# Hypothesis explores conversation flow to reach active states, then injects.</pre></div>
<p><strong>File:</strong> <code>test_stale_injection.py</code> (2000 Hypothesis examples, 50 steps each)</p>
</div>
<h3>Layer 8: Regression harness</h3>
<div class="testing">
<h4>What it is</h4>
<p>Auto-discovers <code>.py</code> files in <code>tests/regressions/</code> that define a <code>PACKETS</code> list. Replays each through the pipeline and checks invariants (valid state types, no signals in effects).</p>
<h4>Current regressions</h4>
<ul>
<li><code>stale_failure_kills_synthesis.py</code> &mdash; stale FailedEffect disrupting active synthesis</li>
<li><code>stale_failedeffect_startsynthesis_from_turn_0_killed_synthesis_in_turn_1.py</code></li>
<li><code>stale_failedeffect_startsynthesis_from_turn_1_killed_synthesis_in_turn_2.py</code></li>
</ul>
<p>To add a regression manually: drop a <code>.py</code> file in <code>tests/regressions/</code> with a <code>PACKETS</code> list. The harness picks it up automatically.</p>
<p><strong>Files:</strong> <code>test_regressions.py</code>, <code>tests/regressions/*.py</code></p>
</div>
<!-- ===== 10. DEVELOPER GUIDE ===== -->
<h2 id="developer-guide">10. Developer Guide: Adding a New Step Function</h2>
<h3>Step 1: Write the step function</h3>
<p>Signature: <code>(state, packet) &rarr; (new_state, [Packet | Effect])</code>. State must be a frozen dataclass. Include a <code>turn: int = 0</code> field if the step function can be "active" (processing data across ticks).</p>
<h3>Step 2: Handle UserStartedSpeakingSignal</h3>
<p>Increment <code>turn</code> in the new state. This keeps the stage in lockstep with all others. Every stage must see the same turn number at all times.</p>
<h3>Step 3: Handle FailedEffect</h3>
<p>Check <code>effect.turn &gt;= state.turn</code> before resetting active state. Stale failures from previous turns must be ignored &mdash; otherwise they'll kill active work in the current turn.</p>
<h3>Step 4: Check turn on incoming data</h3>
<p>If <code>packet.turn &lt; state.turn</code>, return <code>(state, [])</code> &mdash; silently discard. This prevents stale results from triggering new work.</p>
<h3>Step 5: Add to STAGE_CONFIGS</h3>
<p>Add to <code>batch_runner.py</code> for production and to <code>test_stale_injection.py</code> for testing. All injection tests are auto-generated from the state dataclass &mdash; no manual test code needed.</p>
<h3>Step 6: Add conversation flow rules</h3>
<p>If the step function introduces new session states or transitions, add corresponding rules to <code>test_stateful.py</code>. The stateful test models the conversation as a state machine and needs to know about new states.</p>
<h3>Step 7: Register handler</h3>
<p>Register the effect handler in the effect executor if the step function emits new effect types. Use <code>register_many()</code> if one handler handles multiple effect types.</p>
<div class="principle">
<p><strong>What's automatic:</strong> all 5 injection classes &times; all active fields = full injection coverage. Regression file generation on any failure. Turn consistency invariant in the stateful test. You only write the step function, handler, and flow rules.</p>
</div>
<!-- ===== 11. CODE MAP ===== -->
<h2 id="code-map">11. Code Map</h2>
<h3>Core pipeline</h3>
<div class="file-ref"><div class="path">src/superagent/dispatcher.py</div><div class="desc">Two-lane FIFO dispatcher + BatchSource protocol</div></div>
<div class="file-ref"><div class="path">src/superagent/batch_runner.py</div><div class="desc">Tick runner: read packet &rarr; step functions &rarr; execute effects</div></div>
<div class="file-ref"><div class="path">src/superagent/effect_executor.py</div><div class="desc">Fire-and-forget executor with create_task dispatch, register_many, FailedEffect on exception</div></div>
<div class="file-ref"><div class="path">src/superagent/effects.py</div><div class="desc">Base Effect class with turn field</div></div>
<div class="file-ref"><div class="path">src/superagent/packets.py</div><div class="desc">Packet, Signal, Frame, TextFrame, EndFrame, Utterance, UserStartedSpeakingSignal, FailedEffect</div></div>
<div class="file-ref"><div class="path">src/superagent/recording.py</div><div class="desc">RecordingDispatcher + ReplayDispatcher</div></div>
<h3>Step functions</h3>
<div class="file-ref"><div class="path">src/superagent/stt/parakeet_step.py</div><div class="desc">STT step function with turn tracking</div></div>
<div class="file-ref"><div class="path">src/superagent/text/splitter.py</div><div class="desc">Sentence splitter with turn propagation via buffer_turn</div></div>
<div class="file-ref"><div class="path">src/superagent/tts/kokoro_step.py</div><div class="desc">TTS step function with turn tracking and stale packet discard</div></div>
<div class="file-ref"><div class="path">src/superagent/audio/speaker_step.py</div><div class="desc">Speaker step function with turn tracking and stale packet discard</div></div>
<div class="file-ref"><div class="path">src/superagent/audio/sink_step.py</div><div class="desc">WAV file sink step function</div></div>
<div class="file-ref"><div class="path">src/superagent/print_step.py</div><div class="desc">Debug printer step function</div></div>
<h3>Handlers (nondeterministic boundary)</h3>
<div class="file-ref"><div class="path">src/superagent/handlers/tts.py</div><div class="desc">TTSHandler &mdash; WorkerThread lifecycle for Kokoro synthesis</div></div>
<div class="file-ref"><div class="path">src/superagent/handlers/transcribe.py</div><div class="desc">TranscribeHandler &mdash; STT inference in thread pool</div></div>
<div class="file-ref"><div class="path">src/superagent/handlers/speaker.py</div><div class="desc">SpeakerHandler &mdash; playback thread for audio output</div></div>
<h3>Adapters (nondeterministic boundary)</h3>
<div class="file-ref"><div class="path">src/superagent/adapters/base.py</div><div class="desc">AsyncIterableAdapter, ListAdapter &mdash; push packets from external sources into the dispatcher</div></div>
<div class="file-ref"><div class="path">src/superagent/adapters/mic.py</div><div class="desc">Microphone source adapter</div></div>
<div class="file-ref"><div class="path">src/superagent/adapters/keyboard.py</div><div class="desc">Keyboard source adapter (push-to-talk)</div></div>
<h3>Example</h3>
<div class="file-ref"><div class="path">examples/sttts_deterministic.py</div><div class="desc">Push-to-talk demo: mic &rarr; STT &rarr; splitter &rarr; TTS &rarr; speaker. Live timing: 5-40 &mu;s/tick, 100-175 ms STT, &lt;200 ms barge-in.</div></div>
<h3>Retained for reference</h3>
<div class="file-ref"><div class="path">src/superagent/effect_resolver.py</div><div class="desc">Effect cancellation logic &mdash; proved redundant with two-lane dispatcher (see changelog)</div></div>
<h3>Tests</h3>
<div class="file-ref"><div class="path">tests/test_parakeet_step.py</div><div class="desc">STT step function property tests</div></div>
<div class="file-ref"><div class="path">tests/test_splitter.py</div><div class="desc">Splitter step function property tests</div></div>
<div class="file-ref"><div class="path">tests/test_kokoro_step.py</div><div class="desc">TTS step function property tests</div></div>
<div class="file-ref"><div class="path">tests/test_speaker_step.py</div><div class="desc">Speaker step function property tests</div></div>
<div class="file-ref"><div class="path">tests/test_audio_sink_step.py</div><div class="desc">Audio sink step function property tests</div></div>
<div class="file-ref"><div class="path">tests/test_print_step.py</div><div class="desc">Print step function tests</div></div>
<div class="file-ref"><div class="path">tests/test_boundary.py</div><div class="desc">Adversarial input tests for all step functions</div></div>
<div class="file-ref"><div class="path">tests/test_dispatcher.py</div><div class="desc">Dispatcher property tests (two-lane ordering)</div></div>
<div class="file-ref"><div class="path">tests/test_deterministic_integration.py</div><div class="desc">TickHarness integration &mdash; flat packet streams, no dispatcher/executor</div></div>
<div class="file-ref"><div class="path">tests/test_stateful.py</div><div class="desc">Rule-based stateful test &mdash; conversation state machine with stale packet injection</div></div>
<div class="file-ref"><div class="path">tests/test_stale_injection.py</div><div class="desc">Auto-generated injection framework &mdash; inspects state dataclasses, generates guarded rules</div></div>
<div class="file-ref"><div class="path">tests/test_regressions.py</div><div class="desc">Regression replay harness &mdash; auto-discovers packet streams in tests/regressions/</div></div>
<div class="file-ref"><div class="path">tests/test_failure_integration.py</div><div class="desc">FailedEffect propagation through dispatcher + executor</div></div>
<div class="file-ref"><div class="path">tests/test_recording.py</div><div class="desc">Record/replay identity tests</div></div>
<div class="file-ref"><div class="path">tests/test_effect_resolver.py</div><div class="desc">Effect resolver property tests (retained for reference)</div></div>
<div class="file-ref"><div class="path">tests/test_batch_runner.py</div><div class="desc">Tick runner with mock step functions</div></div>
<div class="file-ref"><div class="path">tests/strategies.py</div><div class="desc">Hypothesis strategies: packets, sessions, adversarial data, session generators</div></div>
<div class="file-ref"><div class="path">tests/regressions/</div><div class="desc">Auto-generated regression packet streams (3 files)</div></div>
<div class="file-ref"><div class="path">tests/helpers.py</div><div class="desc">TickHarness and shared test utilities</div></div>
<!-- ===== 12. CHANGELOG ===== -->
<div class="changelog" id="changelog">
<h3>12. Design Changelog</h3>
<p><strong>v1: Single dumb FIFO dispatcher.</strong> All packets in one queue, no priority. Rejected because TTS produces audio faster than realtime &mdash; a barge-in signal would be buried behind hundreds of audio packets.</p>
<p><strong>v2: Two-lane dispatcher + batching + effect resolver.</strong> Signals and data in separate FIFOs. Configurable batch size. Effect resolver cancelled effects within a batch (e.g., StopPlayback cancelled preceding PlayAudio). Rejected after proving batch/resolver complexity redundant with two-lane priority.</p>
<p><strong>v3: Two-lane dispatcher + single-packet ticks.</strong> No batching, no effect resolver. With signals always arriving first, effects are in the correct order by construction. Proved redundancy via property tests comparing pipeline behavior with and without the resolver across hundreds of generated scenarios. Simplest correct architecture.</p>
<p><strong>v4: Turn numbers, FailedEffect, real handlers, stateful tests.</strong> Added turn tracking to all step functions and effects. Added FailedEffect as a Signal for error propagation. Added real effect handlers (TTSHandler, TranscribeHandler, SpeakerHandler). Added source adapters (mic, keyboard). Added rule-based stateful testing that models the conversation state machine &mdash; found three turn-propagation bugs. Added composable session generators for deterministic integration testing via TickHarness.</p>
<p><strong>v5 (current): Auto-generated injection framework, turn numbers in splitter, STT InputAudio fix.</strong> Added the injection framework that inspects state dataclasses and generates guarded injection rules automatically. Found three new bugs: STT accumulating OutputAudio, splitter accepting stale text, stale FailedEffect killing synthesis. Added regression harness with auto-dump on failure. 455 tests total.</p>
</div>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Merge Primitive: Design Alternatives</title>
<style>
* { box-sizing: border-box; margin: 0; padding: 0; }
body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', system-ui, sans-serif; background: #0d1117; color: #c9d1d9; line-height: 1.6; padding: 2rem; max-width: 1200px; margin: 0 auto; }
h1 { color: #58a6ff; font-size: 1.8rem; margin-bottom: 0.5rem; }
h2 { color: #79c0ff; font-size: 1.3rem; margin: 2rem 0 1rem; border-bottom: 1px solid #21262d; padding-bottom: 0.5rem; }
h3 { color: #d2a8ff; font-size: 1.1rem; margin: 1.5rem 0 0.75rem; }
p, li { color: #8b949e; margin-bottom: 0.75rem; }
.subtitle { color: #8b949e; font-size: 1rem; margin-bottom: 2rem; }
.problem-box { background: #161b22; border: 1px solid #f85149; border-radius: 8px; padding: 1.5rem; margin: 1.5rem 0; }
.problem-box h3 { color: #f85149; margin-top: 0; }
.diagram-container { background: #161b22; border: 1px solid #30363d; border-radius: 12px; padding: 2rem; margin: 1.5rem 0; overflow-x: auto; }
svg { display: block; margin: 0 auto; }
.approach { background: #161b22; border: 1px solid #30363d; border-radius: 12px; padding: 1.5rem; margin: 1.5rem 0; }
.approach.current { border-color: #f0883e; }
.approach.recommended { border-color: #3fb950; }
.tag { display: inline-block; font-size: 0.75rem; font-weight: 600; padding: 0.15rem 0.5rem; border-radius: 10px; margin-left: 0.5rem; vertical-align: middle; }
.tag.current { background: #f0883e22; color: #f0883e; border: 1px solid #f0883e55; }
.tag.alt { background: #58a6ff22; color: #58a6ff; border: 1px solid #58a6ff55; }
.tag.recommended { background: #3fb95022; color: #3fb950; border: 1px solid #3fb95055; }
.pros-cons { display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; margin-top: 1rem; }
.pros h4 { color: #3fb950; font-size: 0.9rem; margin-bottom: 0.5rem; }
.cons h4 { color: #f85149; font-size: 0.9rem; margin-bottom: 0.5rem; }
.pros li, .cons li { font-size: 0.9rem; margin-bottom: 0.4rem; }
ul { padding-left: 1.25rem; }
.verdict { background: #161b22; border: 1px solid #3fb950; border-radius: 12px; padding: 1.5rem; margin: 2rem 0; }
.verdict h2 { color: #3fb950; border: none; margin-top: 0; }
.layer-label { font-size: 0.7rem; font-weight: 600; letter-spacing: 0.05em; }
code { background: #21262d; padding: 0.15rem 0.4rem; border-radius: 4px; font-size: 0.85rem; color: #79c0ff; }
.compare-table { width: 100%; border-collapse: collapse; margin: 1rem 0; font-size: 0.9rem; }
.compare-table th { text-align: left; padding: 0.75rem; background: #21262d; color: #79c0ff; border-bottom: 2px solid #30363d; }
.compare-table td { padding: 0.75rem; border-bottom: 1px solid #21262d; }
.compare-table tr:hover td { background: #1c2129; }
.check { color: #3fb950; }
.cross { color: #f85149; }
.maybe { color: #f0883e; }
</style>
</head>
<body>
<h1>The Merge Primitive: Design Alternatives</h1>
<p class="subtitle">How should a processor consume upstream packets AND produce worker output at the same time?</p>
<div class="problem-box">
<h3>The Problem</h3>
<p>KokoroTTS receives <code>TextFrame</code> packets from upstream and needs to simultaneously emit <code>OutputAudio</code> from its synthesis worker thread. A processor's <code>run()</code> method iterates one stream &mdash; but the data comes from two sources.</p>
<p>Similarly, MicSource needs to inject audio frames from a hardware capture thread into the stream while still receiving signals (for interruption).</p>
</div>
<h2>How packets flow through a processor</h2>
<div class="diagram-container">
<svg width="800" height="180" viewBox="0 0 800 180">
<!-- Pipeline queue labels -->
<text x="60" y="25" fill="#8b949e" font-size="11" text-anchor="middle" class="layer-label">UPSTREAM QUEUE</text>
<text x="740" y="25" fill="#8b949e" font-size="11" text-anchor="middle" class="layer-label">DOWNSTREAM QUEUE</text>
<!-- Upstream queue -->
<rect x="10" y="35" width="100" height="50" rx="6" fill="#21262d" stroke="#58a6ff" stroke-width="1.5"/>
<text x="60" y="58" fill="#58a6ff" font-size="11" text-anchor="middle">TextFrame</text>
<text x="60" y="73" fill="#58a6ff" font-size="11" text-anchor="middle">Signal</text>
<!-- Arrow -->
<line x1="115" y1="60" x2="195" y2="60" stroke="#30363d" stroke-width="1.5" marker-end="url(#arrow)"/>
<!-- InterruptibleStream -->
<rect x="200" y="35" width="140" height="50" rx="6" fill="#21262d" stroke="#f0883e" stroke-width="1.5"/>
<text x="270" y="55" fill="#f0883e" font-size="10" text-anchor="middle">InterruptibleStream</text>
<text x="270" y="70" fill="#8b949e" font-size="9" text-anchor="middle">signals have priority</text>
<!-- Arrow -->
<line x1="345" y1="60" x2="395" y2="60" stroke="#30363d" stroke-width="1.5" marker-end="url(#arrow)"/>
<!-- Processor run() -->
<rect x="400" y="25" width="200" height="70" rx="8" fill="#1c2129" stroke="#d2a8ff" stroke-width="2"/>
<text x="500" y="50" fill="#d2a8ff" font-size="12" text-anchor="middle" font-weight="600">Processor.run()</text>
<text x="500" y="68" fill="#8b949e" font-size="9" text-anchor="middle">async for packet in stream:</text>
<text x="500" y="80" fill="#8b949e" font-size="9" text-anchor="middle"> yield output_packet</text>
<!-- Arrow -->
<line x1="605" y1="60" x2="685" y2="60" stroke="#30363d" stroke-width="1.5" marker-end="url(#arrow)"/>
<!-- Downstream queue -->
<rect x="690" y="35" width="100" height="50" rx="6" fill="#21262d" stroke="#3fb950" stroke-width="1.5"/>
<text x="740" y="58" fill="#3fb950" font-size="11" text-anchor="middle">OutputAudio</text>
<text x="740" y="73" fill="#3fb950" font-size="11" text-anchor="middle">TextFrame</text>
<!-- The problem: worker thread -->
<rect x="430" y="120" width="140" height="40" rx="6" fill="#21262d" stroke="#f85149" stroke-width="1.5" stroke-dasharray="5,3"/>
<text x="500" y="140" fill="#f85149" font-size="10" text-anchor="middle">Worker Thread</text>
<text x="500" y="153" fill="#f85149" font-size="9" text-anchor="middle">(produces OutputAudio)</text>
<!-- Problem arrow -->
<line x1="500" y1="120" x2="500" y2="98" stroke="#f85149" stroke-width="1.5" stroke-dasharray="4,3" marker-end="url(#arrow-red)"/>
<text x="540" y="112" fill="#f85149" font-size="9">How does this</text>
<text x="540" y="122" fill="#f85149" font-size="9">get into run()?</text>
<defs>
<marker id="arrow" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#30363d"/>
</marker>
<marker id="arrow-red" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#f85149"/>
</marker>
<marker id="arrow-green" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#3fb950"/>
</marker>
<marker id="arrow-orange" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#f0883e"/>
</marker>
<marker id="arrow-purple" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#d2a8ff"/>
</marker>
</defs>
</svg>
</div>
<!-- ============ APPROACH A: MERGE (CURRENT) ============ -->
<div class="approach current">
<h3>Approach A: <code>stream.merge(worker)</code> <span class="tag current">Current Design</span></h3>
<p>The stream itself learns how to interleave two data sources. Worker output is injected at the stream level, so <code>run()</code> sees both upstream packets and worker output in the same <code>async for</code> loop.</p>
<div class="diagram-container">
<svg width="800" height="240" viewBox="0 0 800 240">
<!-- Upstream -->
<rect x="10" y="45" width="90" height="40" rx="6" fill="#21262d" stroke="#58a6ff" stroke-width="1.5"/>
<text x="55" y="63" fill="#58a6ff" font-size="10" text-anchor="middle">TextFrame</text>
<text x="55" y="76" fill="#58a6ff" font-size="9" text-anchor="middle">Signal</text>
<!-- Arrow to stream -->
<line x1="105" y1="65" x2="165" y2="65" stroke="#58a6ff" stroke-width="1.5" marker-end="url(#arrow)"/>
<!-- Worker -->
<rect x="10" y="130" width="90" height="40" rx="6" fill="#21262d" stroke="#f0883e" stroke-width="1.5"/>
<text x="55" y="148" fill="#f0883e" font-size="10" text-anchor="middle">Worker</text>
<text x="55" y="161" fill="#f0883e" font-size="9" text-anchor="middle">OutputAudio</text>
<!-- Arrow to stream -->
<line x1="105" y1="150" x2="165" y2="100" stroke="#f0883e" stroke-width="1.5" marker-end="url(#arrow-orange)"/>
<!-- InterruptibleStream with merge -->
<rect x="170" y="45" width="200" height="80" rx="8" fill="#1c2129" stroke="#f0883e" stroke-width="2"/>
<text x="270" y="68" fill="#f0883e" font-size="11" text-anchor="middle" font-weight="600">InterruptibleStream</text>
<text x="270" y="85" fill="#8b949e" font-size="9" text-anchor="middle">merge() interleaves sources</text>
<text x="270" y="100" fill="#8b949e" font-size="9" text-anchor="middle">whichever has data first wins</text>
<text x="270" y="115" fill="#f0883e" font-size="9" text-anchor="middle">signals still have priority</text>
<!-- Arrow to run -->
<line x1="375" y1="85" x2="425" y2="85" stroke="#30363d" stroke-width="1.5" marker-end="url(#arrow)"/>
<text x="400" y="78" fill="#8b949e" font-size="8" text-anchor="middle">mixed</text>
<!-- Processor run() -->
<rect x="430" y="50" width="220" height="110" rx="8" fill="#1c2129" stroke="#d2a8ff" stroke-width="2"/>
<text x="540" y="73" fill="#d2a8ff" font-size="11" text-anchor="middle" font-weight="600">run(stream)</text>
<text x="540" y="92" fill="#8b949e" font-size="9" text-anchor="middle">async for packet in stream:</text>
<text x="540" y="106" fill="#79c0ff" font-size="9" text-anchor="middle"> if TextFrame: w.put(text)</text>
<text x="540" y="120" fill="#3fb950" font-size="9" text-anchor="middle"> if OutputAudio: yield it</text>
<text x="540" y="134" fill="#f85149" font-size="9" text-anchor="middle"> if EndFrame: w.close()</text>
<text x="540" y="148" fill="#8b949e" font-size="9" text-anchor="middle"> else: yield packet</text>
<!-- Arrow to output -->
<line x1="655" y1="85" x2="720" y2="85" stroke="#3fb950" stroke-width="1.5" marker-end="url(#arrow-green)"/>
<!-- Output -->
<rect x="725" y="60" width="65" height="50" rx="6" fill="#21262d" stroke="#3fb950" stroke-width="1.5"/>
<text x="757" y="82" fill="#3fb950" font-size="9" text-anchor="middle">Output</text>
<text x="757" y="95" fill="#3fb950" font-size="9" text-anchor="middle">packets</text>
<!-- Layer bracket -->
<text x="270" y="30" fill="#f0883e" font-size="9" text-anchor="middle" font-weight="600">MERGING HAPPENS HERE (stream layer)</text>
<line x1="170" y1="35" x2="370" y2="35" stroke="#f0883e" stroke-width="1" stroke-dasharray="3,2"/>
</svg>
</div>
<div class="pros-cons">
<div class="pros">
<h4>Strengths</h4>
<ul>
<li><code>run()</code> has a single iteration loop &mdash; simple control flow</li>
<li>Works well for sources (MicSource) where the worker IS the data</li>
<li>Signal priority is preserved automatically</li>
<li>Context manager cleanup is automatic</li>
</ul>
</div>
<div class="cons">
<h4>Concerns</h4>
<ul>
<li>Stream infrastructure becomes complex (push + merge + signal priority)</li>
<li><code>run()</code> must distinguish upstream packets from worker output by type</li>
<li>Hard to test &mdash; merge is an async stream primitive, can't test without async infra</li>
<li>Every bidirectional processor needs to understand stream context managers</li>
</ul>
</div>
</div>
</div>
<!-- ============ APPROACH B: EFFECT HANDLER YIELDS ============ -->
<div class="approach recommended">
<h3>Approach B: Effect handler yields packets <span class="tag recommended">Recommended for processors</span></h3>
<p>The step function returns an <code>Effect</code> describing the work. The effect handler (in <code>StepProcessor</code>) executes it and yields resulting packets. The merging happens inside the processor's run loop, not the stream.</p>
<div class="diagram-container">
<svg width="800" height="280" viewBox="0 0 800 280">
<!-- Upstream -->
<rect x="10" y="55" width="90" height="40" rx="6" fill="#21262d" stroke="#58a6ff" stroke-width="1.5"/>
<text x="55" y="73" fill="#58a6ff" font-size="10" text-anchor="middle">TextFrame</text>
<text x="55" y="86" fill="#58a6ff" font-size="9" text-anchor="middle">Signal</text>
<!-- Arrow to stream -->
<line x1="105" y1="75" x2="155" y2="75" stroke="#58a6ff" stroke-width="1.5" marker-end="url(#arrow)"/>
<!-- InterruptibleStream (simple) -->
<rect x="160" y="55" width="120" height="40" rx="6" fill="#21262d" stroke="#30363d" stroke-width="1.5"/>
<text x="220" y="73" fill="#8b949e" font-size="10" text-anchor="middle">Stream</text>
<text x="220" y="86" fill="#8b949e" font-size="8" text-anchor="middle">(unchanged)</text>
<!-- Arrow to StepProcessor -->
<line x1="285" y1="75" x2="325" y2="75" stroke="#30363d" stroke-width="1.5" marker-end="url(#arrow)"/>
<!-- StepProcessor run() - big box -->
<rect x="330" y="20" width="340" height="230" rx="10" fill="#0d1117" stroke="#3fb950" stroke-width="2"/>
<text x="500" y="42" fill="#3fb950" font-size="11" text-anchor="middle" font-weight="600">StepProcessor.run()</text>
<!-- Step function -->
<rect x="350" y="55" width="140" height="55" rx="6" fill="#21262d" stroke="#d2a8ff" stroke-width="1.5"/>
<text x="420" y="73" fill="#d2a8ff" font-size="10" text-anchor="middle" font-weight="600">step_fn(state, pkt)</text>
<text x="420" y="88" fill="#8b949e" font-size="8" text-anchor="middle">pure, synchronous</text>
<text x="420" y="100" fill="#8b949e" font-size="8" text-anchor="middle">returns (state, outputs)</text>
<!-- outputs arrow splitting -->
<line x1="420" y1="115" x2="420" y2="135" stroke="#d2a8ff" stroke-width="1.5"/>
<!-- Branch: Packet -->
<line x1="420" y1="135" x2="370" y2="155" stroke="#3fb950" stroke-width="1.5"/>
<rect x="345" y="155" width="65" height="30" rx="4" fill="#21262d" stroke="#3fb950" stroke-width="1"/>
<text x="377" y="174" fill="#3fb950" font-size="9" text-anchor="middle">Packet?</text>
<line x1="377" y1="185" x2="377" y2="210" stroke="#3fb950" stroke-width="1" marker-end="url(#arrow-green)"/>
<text x="377" y="225" fill="#3fb950" font-size="9" text-anchor="middle">yield it</text>
<!-- Branch: Effect -->
<line x1="420" y1="135" x2="530" y2="155" stroke="#f0883e" stroke-width="1.5"/>
<rect x="500" y="155" width="65" height="30" rx="4" fill="#21262d" stroke="#f0883e" stroke-width="1"/>
<text x="532" y="174" fill="#f0883e" font-size="9" text-anchor="middle">Effect?</text>
<!-- Effect handler -->
<line x1="532" y1="185" x2="532" y2="195" stroke="#f0883e" stroke-width="1"/>
<rect x="495" y="195" width="150" height="45" rx="4" fill="#21262d" stroke="#f0883e" stroke-width="1.5"/>
<text x="570" y="212" fill="#f0883e" font-size="9" text-anchor="middle">effect_handler(effect)</text>
<text x="570" y="225" fill="#3fb950" font-size="8" text-anchor="middle">async for pkt: yield pkt</text>
<text x="570" y="235" fill="#8b949e" font-size="7" text-anchor="middle">(worker runs here, yields OutputAudio)</text>
<!-- Output arrow -->
<line x1="675" y1="120" x2="720" y2="120" stroke="#3fb950" stroke-width="1.5" marker-end="url(#arrow-green)"/>
<!-- Output -->
<rect x="725" y="95" width="65" height="50" rx="6" fill="#21262d" stroke="#3fb950" stroke-width="1.5"/>
<text x="757" y="117" fill="#3fb950" font-size="9" text-anchor="middle">Output</text>
<text x="757" y="130" fill="#3fb950" font-size="9" text-anchor="middle">packets</text>
<!-- Layer bracket -->
<text x="500" y="265" fill="#3fb950" font-size="9" text-anchor="middle" font-weight="600">MERGING HAPPENS HERE (processor layer)</text>
<line x1="330" y1="255" x2="670" y2="255" stroke="#3fb950" stroke-width="1" stroke-dasharray="3,2"/>
</svg>
</div>
<div class="pros-cons">
<div class="pros">
<h4>Strengths</h4>
<ul>
<li>Stream stays simple &mdash; just delivers packets and signals</li>
<li>Step function is pure and testable without async</li>
<li>Effect handler encapsulates all worker interaction</li>
<li>No new stream primitives needed</li>
<li>Easy to mock for testing (just return packets)</li>
</ul>
</div>
<div class="cons">
<h4>Concerns</h4>
<ul>
<li>Effect handler blocks the <code>run()</code> loop while yielding &mdash; can't receive new upstream packets during synthesis</li>
<li>For true bidirectional flow (text in WHILE audio out), would need the handler to internally merge</li>
<li>Doesn't help sources (MicSource still needs stream-level injection)</li>
</ul>
</div>
</div>
</div>
<!-- ============ APPROACH C: INTERNAL QUEUE ============ -->
<div class="approach">
<h3>Approach C: Internal queue with select-style loop <span class="tag alt">Alternative</span></h3>
<p>The processor manages its own queue between the worker and <code>run()</code>. The <code>run()</code> method races upstream and the worker queue, processing whichever has data first.</p>
<div class="diagram-container">
<svg width="800" height="230" viewBox="0 0 800 230">
<!-- Upstream -->
<rect x="10" y="35" width="90" height="40" rx="6" fill="#21262d" stroke="#58a6ff" stroke-width="1.5"/>
<text x="55" y="53" fill="#58a6ff" font-size="10" text-anchor="middle">TextFrame</text>
<text x="55" y="66" fill="#58a6ff" font-size="9" text-anchor="middle">Signal</text>
<line x1="105" y1="55" x2="155" y2="55" stroke="#58a6ff" stroke-width="1.5" marker-end="url(#arrow)"/>
<!-- Stream -->
<rect x="160" y="35" width="100" height="40" rx="6" fill="#21262d" stroke="#30363d" stroke-width="1.5"/>
<text x="210" y="53" fill="#8b949e" font-size="10" text-anchor="middle">Stream</text>
<text x="210" y="66" fill="#8b949e" font-size="8" text-anchor="middle">(unchanged)</text>
<!-- Processor box -->
<rect x="290" y="15" width="370" height="200" rx="10" fill="#0d1117" stroke="#58a6ff" stroke-width="2"/>
<text x="475" y="35" fill="#58a6ff" font-size="11" text-anchor="middle" font-weight="600">Processor.run()</text>
<!-- Stream input -->
<line x1="265" y1="55" x2="310" y2="70" stroke="#58a6ff" stroke-width="1.5" marker-end="url(#arrow)"/>
<!-- Select box -->
<rect x="310" y="50" width="150" height="70" rx="6" fill="#21262d" stroke="#79c0ff" stroke-width="1.5"/>
<text x="385" y="70" fill="#79c0ff" font-size="10" text-anchor="middle" font-weight="600">asyncio.wait()</text>
<text x="385" y="85" fill="#8b949e" font-size="8" text-anchor="middle">race stream vs queue</text>
<text x="385" y="100" fill="#8b949e" font-size="8" text-anchor="middle">whichever resolves first</text>
<text x="385" y="112" fill="#79c0ff" font-size="8" text-anchor="middle">signals checked each tick</text>
<!-- Worker thread -->
<rect x="340" y="150" width="120" height="40" rx="6" fill="#21262d" stroke="#f0883e" stroke-width="1.5"/>
<text x="400" y="168" fill="#f0883e" font-size="10" text-anchor="middle">Worker</text>
<text x="400" y="181" fill="#f0883e" font-size="9" text-anchor="middle">OutputAudio</text>
<!-- Queue between worker and select -->
<rect x="490" y="140" width="80" height="30" rx="4" fill="#21262d" stroke="#f0883e" stroke-width="1"/>
<text x="530" y="160" fill="#f0883e" font-size="9" text-anchor="middle">Queue</text>
<line x1="465" y1="165" x2="490" y2="155" stroke="#f0883e" stroke-width="1" marker-end="url(#arrow-orange)"/>
<line x1="530" y1="140" x2="465" y2="110" stroke="#f0883e" stroke-width="1" stroke-dasharray="3,2" marker-end="url(#arrow-orange)"/>
<!-- Output from select -->
<line x1="465" y1="85" x2="540" y2="85" stroke="#3fb950" stroke-width="1.5"/>
<text x="555" y="75" fill="#8b949e" font-size="8" text-anchor="middle">process</text>
<text x="555" y="87" fill="#8b949e" font-size="8" text-anchor="middle">& yield</text>
<line x1="575" y1="85" x2="630" y2="85" stroke="#3fb950" stroke-width="1.5"/>
<!-- Output arrow -->
<line x1="665" y1="85" x2="720" y2="85" stroke="#3fb950" stroke-width="1.5" marker-end="url(#arrow-green)"/>
<!-- Output -->
<rect x="725" y="60" width="65" height="50" rx="6" fill="#21262d" stroke="#3fb950" stroke-width="1.5"/>
<text x="757" y="82" fill="#3fb950" font-size="9" text-anchor="middle">Output</text>
<text x="757" y="95" fill="#3fb950" font-size="9" text-anchor="middle">packets</text>
<text x="475" y="225" fill="#58a6ff" font-size="9" text-anchor="middle" font-weight="600">MERGING HAPPENS HERE (processor layer, explicit)</text>
</svg>
</div>
<div class="pros-cons">
<div class="pros">
<h4>Strengths</h4>
<ul>
<li>True concurrent bidirectional flow &mdash; processes text and audio simultaneously</li>
<li>Stream stays simple</li>
<li>Explicit control over priority and scheduling</li>
</ul>
</div>
<div class="cons">
<h4>Concerns</h4>
<ul>
<li>Every bidirectional processor reimplements the select loop</li>
<li>Complex async control flow (racing, cancellation, cleanup)</li>
<li>Hard to get right &mdash; edge cases in concurrent queue draining</li>
<li>Essentially reimplements what <code>merge</code> does, but per-processor</li>
</ul>
</div>
</div>
</div>
<!-- ============ COMPARISON TABLE ============ -->
<h2>Side-by-side comparison</h2>
<table class="compare-table">
<tr>
<th>Criterion</th>
<th>A: stream.merge()</th>
<th>B: Effect handler</th>
<th>C: Internal queue</th>
</tr>
<tr>
<td>Stream complexity</td>
<td class="cross">High (push + merge + signals)</td>
<td class="check">Low (unchanged)</td>
<td class="check">Low (unchanged)</td>
</tr>
<tr>
<td>Processor complexity</td>
<td class="check">Low (single loop)</td>
<td class="check">Low (step fn + handler)</td>
<td class="cross">High (select loop)</td>
</tr>
<tr>
<td>Testability</td>
<td class="cross">Needs async infra</td>
<td class="check">Pure step fn + mock handler</td>
<td class="maybe">Needs async infra</td>
</tr>
<tr>
<td>True bidirectional</td>
<td class="check">Yes (interleaved)</td>
<td class="cross">No (handler blocks loop)</td>
<td class="check">Yes (concurrent)</td>
</tr>
<tr>
<td>Works for sources</td>
<td class="check">Yes (MicSource)</td>
<td class="cross">No (sources don't have step fns)</td>
<td class="maybe">Possible but awkward</td>
</tr>
<tr>
<td>Reusable pattern</td>
<td class="check">One primitive, all processors</td>
<td class="check">Factory handles it</td>
<td class="cross">Reimplemented per processor</td>
</tr>
<tr>
<td>Signal handling</td>
<td class="check">Built-in priority</td>
<td class="maybe">Step fn handles signals</td>
<td class="maybe">Manual signal checking</td>
</tr>
</table>
<!-- ============ VERDICT ============ -->
<div class="verdict">
<h2>Assessment</h2>
<h3>The core tension</h3>
<p><code>merge</code> solves the right problem, but it puts the solution in the stream layer where it adds complexity to infrastructure that every processor depends on. The effect handler approach keeps it in the processor layer, but can't do true bidirectional flow (the handler blocks the loop).</p>
<h3>Where each approach fits</h3>
<p><strong>Sources (MicSource, WavSource)</strong> &mdash; <code>merge</code> is the right fit. The source IS the worker. There's no "upstream" to interleave with, just the worker's output plus signals. Keeping <code>merge</code> for sources is clean.</p>
<p><strong>Processors (KokoroTTS)</strong> &mdash; Till's current design uses <code>merge</code> here because KokoroTTS needs to receive text and emit audio concurrently. The effect handler approach (B) can't do this without blocking. But: does KokoroTTS actually need true concurrency? In practice, it receives all text, then <code>EndFrame</code> closes input, then it drains audio. The text and audio don't truly overlap &mdash; they're sequential phases.</p>
<p>If the phases are sequential, the effect handler works fine. If they genuinely overlap (streaming text in while streaming audio out), <code>merge</code> is necessary.</p>
<h3>Possible middle ground</h3>
<p>Keep <code>merge</code> as a stream primitive but consider whether it needs to be a first-class context manager on <code>InterruptibleStream</code>, or whether it could live in a utility that processors opt into. This would keep the stream API simple (<code>push</code> for sources, plain iteration for processors) while making <code>merge</code> available as a composition helper for the rare processor that needs true bidirectional flow.</p>
</div>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment