|
<!doctype html> |
|
<html lang="en"> |
|
<head> |
|
<meta charset="utf-8"> |
|
<meta name="viewport" content="width=device-width, initial-scale=1"> |
|
<title>AI Prediction Verification Calibration Gate</title> |
|
<style> |
|
:root { |
|
--ink: #171717; |
|
--muted: #66615b; |
|
--line: #d8d1c7; |
|
--paper: #faf8f3; |
|
--panel: #ffffff; |
|
--amber: #d28a16; |
|
--blue: #1e5d8f; |
|
--green: #19705f; |
|
--red: #b14d42; |
|
--shadow: 0 16px 40px rgba(38, 33, 26, 0.10); |
|
} |
|
* { box-sizing: border-box; } |
|
body { |
|
margin: 0; |
|
background: var(--paper); |
|
color: var(--ink); |
|
font-family: ui-serif, Georgia, "Times New Roman", serif; |
|
line-height: 1.45; |
|
} |
|
header { |
|
padding: 42px 6vw 30px; |
|
border-bottom: 1px solid var(--line); |
|
background: linear-gradient(90deg, #fffdf8 0%, #f5efe4 100%); |
|
} |
|
.eyebrow { |
|
color: var(--blue); |
|
font: 700 12px/1.2 ui-monospace, SFMono-Regular, Menlo, monospace; |
|
letter-spacing: .08em; |
|
text-transform: uppercase; |
|
} |
|
h1 { |
|
max-width: 980px; |
|
margin: 12px 0 12px; |
|
font-size: clamp(36px, 6vw, 76px); |
|
line-height: .95; |
|
letter-spacing: 0; |
|
} |
|
.lede { |
|
max-width: 900px; |
|
color: var(--muted); |
|
font-size: 20px; |
|
} |
|
main { |
|
padding: 28px 6vw 56px; |
|
display: grid; |
|
gap: 22px; |
|
} |
|
section { |
|
background: var(--panel); |
|
border: 1px solid var(--line); |
|
border-radius: 8px; |
|
box-shadow: var(--shadow); |
|
padding: 24px; |
|
} |
|
h2 { |
|
margin: 0 0 16px; |
|
font-size: 24px; |
|
} |
|
.grid { |
|
display: grid; |
|
grid-template-columns: repeat(4, minmax(0, 1fr)); |
|
gap: 14px; |
|
} |
|
.two { |
|
display: grid; |
|
grid-template-columns: minmax(0, 1fr) minmax(0, 1fr); |
|
gap: 16px; |
|
} |
|
.card { |
|
border: 1px solid var(--line); |
|
border-radius: 8px; |
|
padding: 16px; |
|
background: #fffdf8; |
|
} |
|
.tag { |
|
display: inline-block; |
|
border: 1px solid var(--line); |
|
border-radius: 999px; |
|
padding: 3px 9px; |
|
margin-bottom: 10px; |
|
color: var(--muted); |
|
font: 700 11px/1.2 ui-monospace, SFMono-Regular, Menlo, monospace; |
|
} |
|
ul, ol { margin: 0; padding-left: 20px; } |
|
li { margin: 7px 0; } |
|
table { |
|
width: 100%; |
|
border-collapse: collapse; |
|
font-size: 15px; |
|
} |
|
th, td { |
|
border-bottom: 1px solid var(--line); |
|
padding: 10px 8px; |
|
text-align: left; |
|
vertical-align: top; |
|
} |
|
th { |
|
color: var(--blue); |
|
font: 700 12px/1.2 ui-monospace, SFMono-Regular, Menlo, monospace; |
|
text-transform: uppercase; |
|
letter-spacing: .04em; |
|
} |
|
.status-pass { color: var(--green); font-weight: 700; } |
|
.status-watch { color: var(--amber); font-weight: 700; } |
|
.status-stop { color: var(--red); font-weight: 700; } |
|
code { |
|
background: #f2ece1; |
|
border: 1px solid var(--line); |
|
border-radius: 5px; |
|
padding: 1px 5px; |
|
font-family: ui-monospace, SFMono-Regular, Menlo, monospace; |
|
font-size: .92em; |
|
} |
|
@media (max-width: 920px) { |
|
.grid, .two { grid-template-columns: 1fr; } |
|
header, main { padding-left: 18px; padding-right: 18px; } |
|
} |
|
</style> |
|
</head> |
|
<body> |
|
<header> |
|
<div class="eyebrow">PLS purpose_e2e_toolbox_v2 / primary_artifact</div> |
|
<h1>AI Prediction Verification Calibration Gate</h1> |
|
<p class="lede">This pack prevents another vague rebuild of the prediction verification module. It defines the gate that decides whether the team should accept the label policy, seed evidence, run a batch trial, or reopen data-source work before any new engineering cycle is dispatched.</p> |
|
</header> |
|
|
|
<main> |
|
<section> |
|
<h2>Thirty Day Path</h2> |
|
<div class="grid"> |
|
<div class="card"><span class="tag">D1</span><strong>Policy lock</strong><br>Louis accepts the hit/miss/unknown label policy and selects 10 prior AI review predictions as seed cases.</div> |
|
<div class="card"><span class="tag">D7</span><strong>Batch trial</strong><br>50 predictions are auto-labeled from signals, action items, commits, worker logs, and review notes. Unknown rate must be below 25%.</div> |
|
<div class="card"><span class="tag">D14</span><strong>Correction routing</strong><br>Miss reasons route to owners: direction gap, evidence gap, resource gap, authorization gap, or execution drift.</div> |
|
<div class="card"><span class="tag">D30</span><strong>Review operating loop</strong><br>Calibration score becomes part of the weekly company AI review, with trend, owner, risk, and next action visible in PLS.</div> |
|
</div> |
|
</section> |
|
|
|
<section> |
|
<h2>Purpose To Purpose E2E</h2> |
|
<div class="two"> |
|
<div class="card"> |
|
<span class="tag">Flow</span> |
|
<ol> |
|
<li>AI review produces a prediction with owner, due date, confidence, and expected evidence.</li> |
|
<li>PLS collects signals, action items, commits, deployment logs, worker completions, and human review notes.</li> |
|
<li>The matcher assigns <code>hit</code>, <code>miss</code>, or <code>unknown</code> plus evidence links.</li> |
|
<li>Human reviewer samples labels and records override reasons.</li> |
|
<li>PLS opens correction tasks for repeated miss reasons.</li> |
|
<li>Next reviews use calibrated confidence, fewer vague predictions, and better owner routing.</li> |
|
</ol> |
|
</div> |
|
<div class="card"> |
|
<span class="tag">Measurable end state</span> |
|
<ul> |
|
<li>Every reviewed prediction has evidence provenance or an explicit data gap.</li> |
|
<li>Unknown labels fall below 25% by D7 and below 15% by D30.</li> |
|
<li>Miss reasons create action items instead of passive commentary.</li> |
|
<li>Weekly review decisions show whether AI predictions improved project, money, or risk indicators.</li> |
|
</ul> |
|
</div> |
|
</div> |
|
</section> |
|
|
|
<section> |
|
<h2>Preflight Decision Gate</h2> |
|
<table> |
|
<thead><tr><th>Condition</th><th>Decision</th><th>Owner</th><th>Acceptance</th></tr></thead> |
|
<tbody> |
|
<tr><td>Prior production pack exists but label policy is not accepted.</td><td class="status-stop">Stop rebuild</td><td>Louis</td><td>Accept label policy and seed set before engineering.</td></tr> |
|
<tr><td>Policy accepted but fewer than 10 seed predictions exist.</td><td class="status-watch">Seed first</td><td>Louis + zihrou</td><td>10 predictions with expected evidence and review date.</td></tr> |
|
<tr><td>Seed set exists but D7 batch has not run.</td><td class="status-watch">Run batch trial</td><td>iron</td><td>50 labels, unknown below 25%, reviewer sample completed.</td></tr> |
|
<tr><td>Unknown rate above 25%.</td><td class="status-stop">Open data-source gap</td><td>iron</td><td>Missing source mapped to API/sync owner and due date.</td></tr> |
|
<tr><td>Hit/miss labels stable and correction routes active.</td><td class="status-pass">Proceed to productization</td><td>Louis</td><td>D14 correction routing and D30 review dashboard accepted.</td></tr> |
|
</tbody> |
|
</table> |
|
</section> |
|
|
|
<section> |
|
<h2>Value And Money Path</h2> |
|
<div class="two"> |
|
<div class="card"> |
|
<span class="tag">Economic logic</span> |
|
<ul> |
|
<li>Revenue: better AI review predictions identify accounts and internal projects that are ready for monetizable delivery.</li> |
|
<li>Cost: fewer repeated AI work dispatches when the real blocker is policy, seed data, or adoption.</li> |
|
<li>Risk: false confidence is exposed before it shapes staffing, roadmap, or client commitments.</li> |
|
<li>Conversion: prediction evidence gives teams a clearer reason to adopt AI operating routines.</li> |
|
</ul> |
|
</div> |
|
<div class="card"> |
|
<span class="tag">Human capability lift</span> |
|
<ul> |
|
<li>Louis gets a calibration view instead of another text-only status report.</li> |
|
<li>zihrou can separate direction, resource, and authorization misses.</li> |
|
<li>iron can see exact evidence gaps for worker, repo, and signal ingestion.</li> |
|
<li>Project owners learn to write predictions that are testable, not performative.</li> |
|
</ul> |
|
</div> |
|
</div> |
|
</section> |
|
|
|
<section> |
|
<h2>Solution Stack</h2> |
|
<table> |
|
<thead><tr><th>Layer</th><th>Production decision</th><th>Artifact in pack</th></tr></thead> |
|
<tbody> |
|
<tr><td>Context framework</td><td>Prediction is a claim with expected evidence, owner, due date, confidence, and review window.</td><td><code>production-brief.md</code></td></tr> |
|
<tr><td>Workflow</td><td>Policy accept -> seed -> batch match -> reviewer sample -> correction route -> review dashboard.</td><td>This HTML gate</td></tr> |
|
<tr><td>Data model</td><td>Prediction, evidence event, match result, reviewer override, correction task, calibration summary.</td><td><code>data-model.md</code></td></tr> |
|
<tr><td>Tool/app</td><td>PLS dispatcher preflight that blocks duplicate build jobs until acceptance gates are met.</td><td>This HTML gate</td></tr> |
|
<tr><td>Acceptance</td><td>Unknown threshold, reviewer sample rate, owner/due, and correction routing are testable.</td><td><code>acceptance-tests.md</code></td></tr> |
|
<tr><td>Adoption upgrade</td><td>Weekly scorecard and learning memory tell the next worker what must happen next.</td><td><code>learning-memory.json</code></td></tr> |
|
</tbody> |
|
</table> |
|
</section> |
|
|
|
<section> |
|
<h2>Owner, Due, Acceptance</h2> |
|
<ul> |
|
<li>Owner: Louis. Reviewers: zihrou and iron.</li> |
|
<li>Due: 2026-05-27 for policy acceptance and 10 seed predictions; 2026-05-31 for first 50-case batch trial.</li> |
|
<li>Acceptance: label policy accepted, seed set complete, D7 unknown rate below 25%, reviewer sample finished, decision record present, and next correction task opened.</li> |
|
<li>People sync: send only short LINE summary; primary durable artifact is this pack and its Gist URL.</li> |
|
</ul> |
|
</section> |
|
</main> |
|
</body> |
|
</html> |