Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save KyleAMathews/0073d8724887eb7caee18bb67f32b66e to your computer and use it in GitHub Desktop.

Select an option

Save KyleAMathews/0073d8724887eb7caee18bb67f32b66e to your computer and use it in GitHub Desktop.
HTML explainer for Electric issue #4340
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>Issue #4340: Pull-wake claim release bug</title>
<style>
:root {
color-scheme: light dark;
--bg: #fbfaf7;
--fg: #20201d;
--muted: #68665f;
--card: #ffffff;
--line: #dfd9cc;
--accent: #d45d2f;
--accent-soft: #fff0e9;
--code: #f4efe6;
--ok: #2d7d46;
--bad: #b33a3a;
--warn: #ad741f;
}
@media (prefers-color-scheme: dark) {
:root {
--bg: #181715;
--fg: #f3efe7;
--muted: #b9b1a5;
--card: #22201d;
--line: #3a352f;
--accent: #ff946b;
--accent-soft: #321f18;
--code: #2b2824;
--ok: #82d99a;
--bad: #ff8b8b;
--warn: #ffc46b;
}
}
* { box-sizing: border-box; }
body {
margin: 0;
font-family: ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif;
background:
radial-gradient(circle at 20% 0%, rgba(212, 93, 47, 0.13), transparent 34rem),
var(--bg);
color: var(--fg);
line-height: 1.55;
}
main {
width: min(980px, calc(100% - 32px));
margin: 0 auto;
padding: 56px 0 80px;
}
h1 {
font-size: clamp(2.1rem, 5vw, 4.5rem);
line-height: 0.98;
letter-spacing: -0.06em;
margin: 0 0 18px;
max-width: 900px;
}
h2 {
font-size: clamp(1.45rem, 3vw, 2.35rem);
letter-spacing: -0.04em;
margin: 56px 0 16px;
}
h3 {
font-size: 1.08rem;
margin: 0 0 10px;
}
p { margin: 0 0 16px; }
a { color: var(--accent); }
code, pre {
font-family: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", monospace;
}
pre {
overflow-x: auto;
padding: 16px;
border-radius: 14px;
background: var(--code);
border: 1px solid var(--line);
font-size: 0.9rem;
}
.lede {
font-size: clamp(1.1rem, 2vw, 1.45rem);
color: var(--muted);
max-width: 780px;
}
.badge {
display: inline-flex;
align-items: center;
gap: 8px;
border: 1px solid var(--line);
background: var(--card);
padding: 8px 12px;
border-radius: 999px;
color: var(--muted);
font-size: 0.92rem;
margin-bottom: 22px;
}
.card {
background: color-mix(in oklab, var(--card), transparent 5%);
border: 1px solid var(--line);
border-radius: 22px;
padding: 22px;
box-shadow: 0 20px 60px rgba(0, 0, 0, 0.06);
}
.grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(230px, 1fr));
gap: 16px;
margin-top: 20px;
}
.diagram {
display: grid;
grid-template-columns: 1fr auto 1fr;
gap: 12px;
align-items: center;
margin: 24px 0;
}
.box {
border: 1px solid var(--line);
background: var(--card);
border-radius: 18px;
padding: 18px;
min-height: 105px;
}
.arrow {
font-size: 2rem;
color: var(--accent);
}
.pill {
display: inline-block;
border-radius: 999px;
padding: 3px 9px;
font-size: 0.8rem;
font-weight: 700;
margin-right: 6px;
}
.bad { color: var(--bad); }
.ok { color: var(--ok); }
.warn { color: var(--warn); }
.pill.bad { background: color-mix(in oklab, var(--bad), transparent 84%); }
.pill.ok { background: color-mix(in oklab, var(--ok), transparent 84%); }
.pill.warn { background: color-mix(in oklab, var(--warn), transparent 84%); }
.timeline {
display: grid;
gap: 14px;
margin-top: 18px;
}
.step {
display: grid;
grid-template-columns: 42px 1fr;
gap: 14px;
align-items: start;
}
.num {
width: 42px;
height: 42px;
border-radius: 50%;
display: grid;
place-items: center;
background: var(--accent-soft);
color: var(--accent);
font-weight: 800;
border: 1px solid var(--line);
}
.callout {
border-left: 5px solid var(--accent);
background: var(--accent-soft);
border-radius: 14px;
padding: 18px;
margin: 24px 0;
}
.compare {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
gap: 16px;
}
ul { padding-left: 1.25rem; }
@media (max-width: 680px) {
.diagram { grid-template-columns: 1fr; }
.arrow { transform: rotate(90deg); justify-self: center; }
}
</style>
</head>
<body>
<main>
<div class="badge">Electric Agents · Issue #4340 · Pull-wake claim release</div>
<h1>A claim can finish, but the database never hears about it.</h1>
<p class="lede">
Issue #4340 is about a release path that trusts an in-memory token too much.
If that token disappears, the server skips the database cleanup, leaving a claim
marked <code>active</code> and an entity stuck <code>running</code>.
</p>
<section class="card">
<h2 style="margin-top: 0">The moving parts</h2>
<div class="diagram">
<div class="box">
<h3>In memory</h3>
<p><code>ClaimWriteTokenStore</code></p>
<p>Answers: “does this process still own the stream?”</p>
</div>
<div class="arrow">→</div>
<div class="box">
<h3>In Postgres</h3>
<p><code>consumer_claims</code></p>
<p>Records: “which claim is active or released?”</p>
</div>
</div>
<p>
The bug is that the database release is gated by the in-memory ownership check.
But memory can be lost. The database row is the durable fact.
</p>
</section>
<section>
<h2>The current shape of the bug</h2>
<p>In <code>callbackForward</code>, when a runner sends <code>done: true</code>, the code does this:</p>
<pre><code>const stillOwnsClaim = claimWriteTokens.owns(service, stream, consumerId)
const entity = await registry.getEntityByStream(stream)
if (entity && stillOwnsClaim) {
await registry.materializeReleasedClaim({ consumerId, epoch })
await registry.updateStatus(entity.url, 'idle')
claimWriteTokens.clearStream(service, stream)
await entityBridgeManager.onEntityChanged(entity.url)
}</code></pre>
<div class="callout">
<strong>The problem:</strong> <code>materializeReleasedClaim</code> only runs when
<code>stillOwnsClaim</code> is true. If memory says “no,” the durable DB row is never released.
</div>
</section>
<section>
<h2>How the leak happens</h2>
<div class="timeline">
<div class="step">
<div class="num">1</div>
<div class="card">
<h3>A wake is claimed</h3>
<p>The server writes an active row to <code>consumer_claims</code>, sets dispatch state active, and marks the entity <code>running</code>.</p>
</div>
</div>
<div class="step">
<div class="num">2</div>
<div class="card">
<h3>The in-memory token disappears</h3>
<p>This can happen because the server restarts, or because a newer wake evicts the old token for the same stream.</p>
</div>
</div>
<div class="step">
<div class="num">3</div>
<div class="card">
<h3>The runner sends <code>done</code></h3>
<p>The callback contains the durable identity: <code>consumerId</code> and <code>epoch</code>.</p>
</div>
</div>
<div class="step">
<div class="num">4</div>
<div class="card">
<h3>The server skips release</h3>
<p><code>stillOwnsClaim</code> is false, so the server logs the done as stale and does not call <code>materializeReleasedClaim</code>.</p>
</div>
</div>
</div>
</section>
<section>
<h2>The visible symptoms</h2>
<div class="grid">
<div class="card">
<span class="pill bad">leak</span>
<h3><code>consumer_claims</code></h3>
<p>The claim row stays <code>active</code> indefinitely.</p>
</div>
<div class="card">
<span class="pill bad">stuck</span>
<h3>Entity status</h3>
<p>The entity can remain <code>running</code> instead of returning to <code>idle</code>.</p>
</div>
<div class="card">
<span class="pill warn">diagnostic</span>
<h3>Runner health</h3>
<p>The health endpoint can reveal active claims that should have been released.</p>
</div>
</div>
</section>
<section>
<h2>The key distinction</h2>
<div class="compare">
<div class="card">
<span class="pill ok">correct gate</span>
<h3>Write authorization</h3>
<p>
Clearing or using the in-memory write token should require
<code>stillOwnsClaim</code>. You do not want an old claim clearing a newer claim’s token.
</p>
</div>
<div class="card">
<span class="pill bad">wrong gate</span>
<h3>DB claim release</h3>
<p>
Releasing <code>consumer_claims</code> should be keyed by durable identity:
<code>(tenantId, consumerId, epoch)</code>. It should not depend on process memory.
</p>
</div>
</div>
</section>
<section>
<h2>The fix</h2>
<p>Split one big gate into three smaller gates:</p>
<pre><code>// 1. Release the durable DB claim whenever the done callback has an epoch.
const release = epoch === undefined
? undefined
: await registry.materializeReleasedClaim({ consumerId, epoch, ackedStreams })
// 2. Mark the entity idle only if this release cleared the active dispatch row,
// or if memory still says this process owns the claim.
if (entity && (release?.entityCleared || stillOwnsClaim)) {
await registry.updateStatus(entity.url, 'idle')
await entityBridgeManager.onEntityChanged(entity.url)
}
// 3. Clear memory only when memory still owns the stream.
if (stillOwnsClaim) {
claimWriteTokens.clearStream(service, stream)
}</code></pre>
<div class="callout">
The registry should return an <code>entityCleared</code> flag so the caller knows
whether the released claim was actually the active dispatch claim.
</div>
</section>
<section>
<h2>Why <code>entityCleared</code> matters</h2>
<p>
An older wake can finish after a newer wake has already claimed the same entity.
In that case, the old <code>consumer_claims</code> row should be marked released,
but the entity should not be marked idle if the newer wake is still running.
</p>
<pre><code>old wake finishes late
├─ release old claim row: yes
├─ clear current dispatch state: only if it still points at old wake
└─ mark entity idle: only if current dispatch state was cleared</code></pre>
</section>
<section>
<h2>Should memory be a read-through cache?</h2>
<p>
Not necessarily. The in-memory token store is on latency-sensitive write paths,
especially stream appends. Every append checks whether the presented write token
is valid for the entity stream. Making that check always hit Postgres would add
database latency and load to a hot path.
</p>
<pre><code>// Hot path: stream append
const token = writeTokenFromHeaders(request.headers)
if (!manager.isValidWriteToken(entity, token)) {
return apiError(401, 'UNAUTHORIZED', 'Invalid write token')
}</code></pre>
<p>
So the better fix is not “make all token checks DB-backed.” The better fix is
to keep the fast in-memory path for write authorization, but stop using it as
the source of truth for durable claim lifecycle.
</p>
<div class="compare">
<div class="card">
<span class="pill ok">keep in memory</span>
<h3>Write-token validation</h3>
<p>
Appends and tag writes can stay fast. The in-memory store answers:
“is this token currently allowed to write to this stream?”
</p>
</div>
<div class="card">
<span class="pill ok">use Postgres</span>
<h3>Claim lifecycle</h3>
<p>
Completion should be durable. Postgres answers:
“has claim <code>(consumerId, epoch)</code> been released?”
</p>
</div>
</div>
</section>
<section>
<h2>The recommended split: option D</h2>
<p>
The system has two representations because it has two jobs. Keep them separate.
</p>
<pre><code>In-memory ClaimWriteTokenStore
answers: current write authority
used by: appends, tag writes, clearing current token
property: fast, ephemeral, one current owner per stream
Postgres consumer_claims + entity_dispatch_state
answers: durable claim lifecycle and active dispatch state
used by: claim release, idle transition, health diagnostics
property: persistent, queryable, survives restart</code></pre>
<div class="callout">
In short: <strong>memory protects live writes</strong>;
<strong>Postgres records claim lifecycle</strong>.
</div>
</section>
<section class="card">
<h2 style="margin-top: 0">Bottom line</h2>
<p>
The issue is real: the code conflates ephemeral ownership with durable lifecycle state.
</p>
<p>
<strong>Memory should protect writes.</strong><br />
<strong>Postgres should decide claim lifecycle.</strong>
</p>
</section>
</main>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment