Skip to content

Instantly share code, notes, and snippets.

@esz135888
Created May 23, 2026 20:43
Show Gist options
  • Select an option

  • Save esz135888/9ce24c991bec8ea1aa7ca190ad0b3657 to your computer and use it in GitHub Desktop.

Select an option

Save esz135888/9ce24c991bec8ea1aa7ca190ad0b3657 to your computer and use it in GitHub Desktop.
PLS job 614b8f50 AI prediction D30 weekly scorecard adoption gate

Acceptance Tests

Test 1: Scorecard Input Completeness

Given D7 calibration and D14 correction artifacts, when D30 weekly scorecard is generated, then it must include run id, correction batch id, evidence snapshot, and decision record reference.

Pass:

  • calibration_run_id exists.
  • correction_batch_id exists.
  • decision_record_ref exists.
  • scorecard week boundary exists.

Test 2: Gate Calculation

Given metric snapshot values, when gate logic runs, then adoption gate must be deterministic.

Pass:

  • routed non-hit <100% returns repair_first.
  • unresolved gap >20% returns repair_first.
  • unknown >=25% returns repair_first.
  • reviewer sample <5 returns repair_first.
  • all thresholds pass returns ship_weekly_scorecard.

Test 3: Human Approval Boundary

Given worker recommendation, when final gate is recorded, then only owner or delegated owner can approve final gate.

Pass:

  • worker recommendation exists.
  • final gate has decided_by.
  • decision rationale is not blank.
  • audit timestamp exists.

Test 4: People Sync

Given a scorecard gate, when PLS sends LINE/PLS backend sync, then the message must be short and ask for a concrete signal.

Pass:

  • Louis receives ship/repair/block decision request.
  • zihrou receives rubric/reviewer backlog only when relevant.
  • iron receives source adapter backlog only when relevant.
  • no raw long report is pushed into LINE.

Test 5: Next Worker Dispatch

Given final gate, when the heartbeat claims the next job, then the next worker action must be explicit.

Pass:

  • ship_weekly_scorecard -> build weekly dashboard/card.
  • repair_first with source gaps -> source adapter repair task.
  • repair_first with rubric gaps -> rubric repair task.
  • blocked -> Louis decision or missing data repair.

Test 6: Completion Evidence

Given this pack is complete, then there must be a durable primary artifact, production readiness model, E2E verification, people sync, learning memory, owner/due/acceptance, and decision record.

Pass:

  • public artifact URL responds 200.
  • all required files are uploaded.
  • learning memory JSON validates.
  • decision record is present.
  • artifact JSON includes required PLS kinds.

Artifact URL or PR

Durable primary artifact:

https://gist.github.com/esz135888/9ce24c991bec8ea1aa7ca190ad0b3657

Gist id:

9ce24c991bec8ea1aa7ca190ad0b3657

Verification:

  • Public Gist URL responds with HTTP 200 after redirects.
  • File list includes HTML primary artifact, production brief, data model, acceptance tests, decision record, learning memory, market sources, and this artifact record.
<!doctype html>
<html lang="zh-Hant">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>AI 預測驗證 D30 Weekly Scorecard Adoption Gate</title>
<style>
:root{--ink:#17212b;--muted:#617080;--line:#d9dfe7;--bg:#f6f8fb;--panel:#fff;--blue:#2457d6;--green:#14805a;--amber:#a36300;--red:#b42318}
*{box-sizing:border-box}body{margin:0;background:var(--bg);color:var(--ink);font-family:Inter,ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFont,"Segoe UI",sans-serif;line-height:1.5}
header{background:#fff;border-bottom:1px solid var(--line);padding:24px clamp(18px,4vw,48px)}main{padding:22px clamp(18px,4vw,48px) 48px;display:grid;gap:16px}
h1{margin:0;font-size:clamp(24px,3vw,36px);letter-spacing:0}h2{margin:0 0 12px;font-size:18px}h3{margin:0 0 8px;font-size:15px}p{margin:0 0 10px}
.sub{color:var(--muted);max-width:1120px;margin-top:8px}.tag{display:inline-flex;align-items:center;height:24px;padding:0 8px;border:1px solid var(--line);border-radius:999px;background:#fbfcfe;color:var(--muted);font-size:12px;margin-right:6px}
.grid{display:grid;gap:16px}.cols4{grid-template-columns:repeat(4,minmax(0,1fr))}.cols3{grid-template-columns:repeat(3,minmax(0,1fr))}.cols2{grid-template-columns:repeat(2,minmax(0,1fr))}
.panel{background:var(--panel);border:1px solid var(--line);border-radius:8px;padding:16px}.metric{min-height:118px;display:flex;flex-direction:column;justify-content:space-between}.label{font-size:13px;color:var(--muted)}.value{font-size:30px;font-weight:760}.ok{color:var(--green)}.warn{color:var(--amber)}.stop{color:var(--red)}
table{width:100%;border-collapse:collapse;font-size:13px}th,td{border-bottom:1px solid var(--line);padding:10px 8px;text-align:left;vertical-align:top}th{background:#fbfcfe;color:var(--muted)}
code{background:#eef2f7;border-radius:4px;padding:2px 5px;font-size:12px}ul{padding-left:18px;margin:0}li{margin:6px 0}.stage{border-left:4px solid var(--blue);padding-left:12px}
@media(max-width:980px){.cols4,.cols3,.cols2{grid-template-columns:1fr}}
</style>
</head>
<body>
<header>
<h1>D30 Weekly Scorecard Adoption Gate</h1>
<p class="sub">把 D7 calibration 與 D14 correction route 收斂成每週可治理的 scorecard。這不是展示頁,而是 PLS 後台/LINE/worker 共同使用的 adoption gate:通過才週更,未通過就自動回到修復派工。</p>
<p><span class="tag">Owner: Louis</span><span class="tag">Reviewers: zihrou / iron</span><span class="tag">Due: 2026-06-14</span><span class="tag">Gate: ship / repair / block</span></p>
</header>
<main>
<section class="grid cols4">
<div class="panel metric"><span class="label">Routed Non-hit</span><span class="value ok">100%</span><span class="label">required before weekly cadence</span></div>
<div class="panel metric"><span class="label">Unresolved Gap</span><span class="value warn">&le;20%</span><span class="label">source/rubric gaps still open</span></div>
<div class="panel metric"><span class="label">Reviewer Agreement</span><span class="value ok">&ge;80%</span><span class="label">sample or dispute routing</span></div>
<div class="panel metric"><span class="label">Adoption Gate</span><span class="value stop">3 states</span><span class="label">ship_weekly_scorecard / repair_first / blocked</span></div>
</section>
<section class="panel">
<h2>30 天路徑</h2>
<div class="grid cols4">
<div class="stage"><h3>D1</h3><p>確認 D7/D14 artifact chain、run ids、route ids、evidence snapshot 都可追溯。</p></div>
<div class="stage"><h3>D7</h3><p>每週 scorecard schema 與 PLS 後台卡片 ready,LINE 只推短摘要與採納選項。</p></div>
<div class="stage"><h3>D14</h3><p>根據 adoption gate 自動派 repair_first 或 ship_weekly_scorecard。</p></div>
<div class="stage"><h3>D30</h3><p>AI 預測驗證變成週節奏:每週看趨勢、決定擴大/修復/停止。</p></div>
</div>
</section>
<section class="grid cols2">
<div class="panel">
<h2>目的到目的 E2E</h2>
<table>
<tr><th>輸入</th><th>處理</th><th>人採用</th><th>改善指標</th></tr>
<tr><td>D7 labels + D14 routes</td><td>計算 weekly scorecard</td><td>Louis 決定 ship/repair/block</td><td>AI review 品質可週更</td></tr>
<tr><td>unresolved gaps</td><td>轉 repair backlog</td><td>iron / zihrou 接手修 source/rubric</td><td>降低 unknown 與重複 miss</td></tr>
<tr><td>rerun cohort</td><td>比較 before/after</td><td>PLS worker 決定下一輪</td><td>縮短驗證到修復週期</td></tr>
</table>
</div>
<div class="panel">
<h2>價值 / 錢路徑</h2>
<ul>
<li>營收:只把可信 AI review pattern 擴大到業務、客訴、特殊申請等高價流程。</li>
<li>成本:把 reviewer 檢查變成抽樣與例外管理,減少每週人工追問。</li>
<li>風險:用 block gate 阻止未修 source gap 進管理儀表板。</li>
<li>轉換:Louis 每週只需看 adoption gate 與 top repair backlog,就能拍板下一輪。</li>
</ul>
</div>
</section>
<section class="panel">
<h2>Weekly Scorecard 欄位</h2>
<table>
<tr><th>Section</th><th>Metric</th><th>Pass</th><th>Action</th></tr>
<tr><td>Quality</td><td>hit_rate, miss_rate, partial_rate, unknown_rate</td><td>unknown &lt;25%</td><td>unknown 高就 repair source</td></tr>
<tr><td>Repair</td><td>routed_non_hit_rate, unresolved_gap_rate</td><td>100%, &le;20%</td><td>未通過不 ship</td></tr>
<tr><td>Human Review</td><td>reviewer_sample_count, agreement_rate</td><td>&ge;5, &ge;80%</td><td>不足就請 zihrou/iron 補樣本</td></tr>
<tr><td>Economics</td><td>manual_minutes_saved, duplicate_review_avoided, decision_cycle_days</td><td>持續改善</td><td>把節省人力換算成採用價值</td></tr>
<tr><td>Adoption</td><td>adoption_gate</td><td>ship_weekly_scorecard</td><td>否則 repair_first 或 blocked</td></tr>
</table>
</section>
<section class="grid cols3">
<div class="panel"><h2>Data / API</h2><p>新增 <code>weekly_prediction_scorecard</code>、<code>scorecard_metric_snapshot</code>、<code>adoption_gate_decision</code>、<code>people_sync_event</code>。API: <code>GET /weekly-scorecards</code>、<code>POST /adoption-decisions</code>、<code>POST /people-sync-events</code>。</p></div>
<div class="panel"><h2>權限 / 稽核</h2><p>Louis 可 approve ship/block;zihrou/iron 可補 reviewer/gap 狀態;worker 只能建議 gate。每個 gate decision 保留 run ids、metric snapshot、actor、timestamp。</p></div>
<div class="panel"><h2>提升人的能力</h2><p>管理層從看報告升級成看 gate;reviewer 從逐案審查變成例外治理;worker 從做檔案變成推進週節奏。</p></div>
</section>
<section class="panel">
<h2>LINE 草稿</h2>
<p>AI 預測驗證已到 D30 weekly scorecard gate。請 Louis 看本週 adoption_gate:若 routed=100%、unresolved_gap<=20%、reviewer agreement>=80%,就同意進週報;否則請 zihrou/iron 依 top repair backlog 修 source/rubric,下週 re-run。</p>
</section>
</main>
</body>
</html>

Data Model / API / Sync / Permissions

Tables

weekly_prediction_scorecard

Field Type Required Notes
id uuid yes Scorecard id.
project_id uuid yes PLS project.
week_start date yes Weekly cadence boundary.
calibration_run_id uuid yes Latest D7 run.
correction_batch_id uuid yes Latest D14 route batch.
adoption_gate enum yes ship_weekly_scorecard, repair_first, blocked.
owner_user_id uuid yes Louis.
published_at datetime no Set only when shipped.
decision_record_ref text yes Link to decision record.

scorecard_metric_snapshot

Field Type Required Notes
scorecard_id uuid yes Parent scorecard.
hit_rate decimal yes D7 quality metric.
miss_rate decimal yes D7 quality metric.
unknown_rate decimal yes Must be <0.25 to ship.
routed_non_hit_rate decimal yes Must equal 1.0.
unresolved_gap_rate decimal yes Must be <=0.2.
reviewer_sample_count integer yes Must be >=5.
reviewer_agreement_rate decimal yes Target >=0.8.
manual_minutes_saved integer no Estimated cost saving.
decision_cycle_days decimal no Cycle time from prediction to gate.

adoption_gate_decision

Field Type Required Notes
id uuid yes Decision id.
scorecard_id uuid yes Parent scorecard.
recommended_gate enum yes Worker recommendation.
final_gate enum yes Louis-approved final gate.
decided_by uuid yes Louis or delegated owner.
rationale text yes Evidence-based reason.
next_worker_action enum yes build_weekly_dashboard, repair_source_gap, repair_rubric, rerun_cohort, manual_review.
decided_at datetime yes Audit timestamp.

people_sync_event

Field Type Required Notes
id uuid yes Sync event id.
scorecard_id uuid yes Parent scorecard.
channel enum yes LINE, PLS_backend, email.
target_profile_ids uuid[] yes Louis, zihrou, iron.
summary text yes Short outer message.
requested_signal text yes Reply or action expected from each person.
sent_at datetime no Set after verified send.

API

API Method Purpose
/ai-prediction/weekly-scorecards POST Create scorecard from D7/D14 state.
/ai-prediction/weekly-scorecards/:id GET Read PLS backend card.
/ai-prediction/adoption-decisions POST Record owner decision and next worker action.
/ai-prediction/people-sync-events POST Prepare LINE/PLS sync payload.
/ai-prediction/weekly-scorecards/:id/recompute POST Recompute after repair/rerun.

Gate Logic

if routed_non_hit_rate < 1.0 -> repair_first
else if unresolved_gap_rate > 0.2 -> repair_first
else if unknown_rate >= 0.25 -> repair_first
else if reviewer_sample_count < 5 -> repair_first
else if reviewer_agreement_rate < 0.8 -> repair_first
else -> ship_weekly_scorecard

blocked is reserved for missing source data, missing owner decision, or PLS backend storage failure.

Sync / Permissions / Audit

  • Worker can create scorecard and recommend gate, but cannot approve final gate.
  • Louis approves final_gate.
  • zihrou can update reviewer/rubric metrics.
  • iron can update source adapter gap metrics.
  • Every scorecard must preserve calibration run id, correction batch id, metric snapshot, decision record ref, actor, timestamp, and model/prompt version.
  • LINE must only send the short people sync. Durable evidence stays in PLS backend or shared artifact.

Decision Record: D30 Weekly Scorecard Adoption Gate

Date: 2026-05-24
Status: Recommended
Owner: Louis
Reviewers: zihrou / iron

Problem

The project now has D7 measurement and D14 repair routing. Without a D30 weekly scorecard, the system still depends on manual memory to decide whether AI prediction verification should ship, repair, or block.

Options Considered

Option A: Publish a dashboard immediately

Pros: visible artifact and easy to share.
Cons: skips owner approval and can expose unverified metrics as if they were management truth.

Option B: Keep generating production packs

Pros: low risk and easy to repeat.
Cons: does not change the operating rhythm. The project would keep producing files instead of decisions.

Option C: Create adoption gate before dashboard

Pros: converts D7/D14 evidence into a weekly decision system; keeps Louis as final approver; auto-routes repair work when gates fail.
Cons: requires PLS backend to store scorecard and decision records.

Recommendation

Choose Option C. It is the necessary bridge between production pack and operating system. The dashboard should come after the adoption gate, not before it.

Adopted Path

  1. Generate weekly scorecard from D7/D14 state.
  2. Compute deterministic recommended gate.
  3. Ask Louis to approve final gate.
  4. Push short LINE sync to Louis, zihrou, and iron.
  5. Dispatch next worker based on gate result.

If Rejected, Required Feedback

The reviewer must specify:

  • PLS backend cannot store weekly scorecard state.
  • Louis does not want to approve gate manually.
  • thresholds should change.
  • D7/D14 data is not ready.
  • this project should pause instead of becoming weekly cadence.

Without one of these objections, proceed to D30 adoption gate and then dashboard/card implementation.

{
"job_id": "614b8f50-12c0-421e-8db3-9ea13b8797c5",
"project_topic": "AI prediction verification module for signals and action-item evidence",
"current_artifact": "D30 Weekly Scorecard Adoption Gate",
"previous_artifacts": [
"D7 Calibration Run Control Tower",
"D14 Correction Router and Weekly Scorecard Pack"
],
"owner": "Louis",
"reviewers": ["zihrou", "iron"],
"due": "2026-06-14",
"market_learning": [
"Mature AI evaluation practice keeps traceable evidence, eval metrics, reviewer annotations, experiments, and production monitoring together.",
"A scorecard should act as a release/adoption gate, not just a visual dashboard.",
"Owner approval remains necessary when metrics influence management or workflow adoption."
],
"next_worker_rule": {
"if_scorecard_missing": "Create weekly_prediction_scorecard from latest D7 calibration run and D14 correction batch.",
"if_recommended_gate_ship": "Build PLS backend weekly dashboard/card and LINE short summary.",
"if_recommended_gate_repair_first_source": "Dispatch source adapter repair for iron before dashboard.",
"if_recommended_gate_repair_first_rubric": "Dispatch rubric/reviewer repair for zihrou before dashboard.",
"if_gate_blocked": "Ask Louis for decision or repair missing data storage before proceeding."
},
"acceptance_gate": {
"routed_non_hit_rate": 1.0,
"unresolved_gap_rate_max": 0.2,
"unknown_rate_max": 0.25,
"reviewer_sample_min": 5,
"reviewer_agreement_target": 0.8
},
"do_not_repeat": [
"Do not create another static production pack for this chain.",
"Do not build dashboard before adoption_gate_decision exists.",
"Do not send long evidence reports to LINE; keep durable evidence in PLS or artifact URL."
]
}

AI 預測驗證 D30 Weekly Scorecard Adoption Gate

場景

前兩輪已建立 D7 calibration run control tower 與 D14 correction router。本輪把專案推到 D30:每週可治理的 scorecard 與 adoption gate。目標不是再產文件,而是讓 PLS 可以每週自動判斷「可上週報、先修復、或阻擋」。

Owner: Louis
Reviewers: zihrou / iron
Due: 2026-06-14
Primary artifact: d30-weekly-scorecard-adoption-gate.html

30 天發展路徑

時點 成果 驗收
D1 D7/D14 artifact chain 可追溯。 run ids、route ids、evidence snapshot、decision record 都存在。
D7 Weekly scorecard schema 與 PLS 後台卡片 ready。 scorecard 可產出 gate、metrics、top repair backlog。
D14 adoption gate 自動派下一輪。 ship_weekly_scorecard / repair_first / blocked 三態運作。
D30 每週治理節奏運作。 LINE 短摘要 + PLS 後台卡片 + worker 下一輪規則完成。

目的到目的 E2E

原始目的:自動核對上次 review 的 AI 預測是否命中。
D30 目的:讓命中驗證變成每週管理決策,而不是一次性報告。

E2E:

  1. D7 run 產出 hit/miss/partial/unknown。
  2. D14 routes 讓 non-hit/dispute 都有 owner、due、acceptance。
  3. D30 scorecard 聚合 quality、repair、human review、economics、adoption metrics。
  4. Louis 只需要看 adoption gate 決定:ship、repair、block。
  5. PLS 依 gate 自動推 LINE 摘要、建立下一輪 worker 任務、保留 learning memory。

價值 / 錢路徑

  • 營收:只把可信的 AI review pattern 擴大到業務、客訴、特殊申請、專案治理等高價流程。
  • 省成本:用 weekly scorecard 取代零散追問與重複 review,讓 reviewer 只處理例外。
  • 降風險:當 unresolved source/rubric gap 過高時阻擋 dashboard 化,避免錯誤管理訊號。
  • 提高轉換:Louis 每週有明確 ship/repair/block 選項,AI 專案從試作轉成管理節奏。
  • 釋放人力:PLS worker 依 gate 自動產生下一輪任務,不需要人重新整理上下文。

提升人的能力

  • Louis:從看文字回報升級成用 gate 管理 AI adoption。
  • zihrou:從逐案判斷升級成 reviewer agreement 與 rubric backlog 治理。
  • iron:從回覆資料缺口升級成 source adapter health owner。
  • PLS worker:從交文件升級成根據 scorecard 自動推進下一輪。

Solution Stack

交付內容
脈絡框架 D7 calibration -> D14 correction -> D30 weekly scorecard -> adoption gate。
作業流程 ingest scorecard inputs -> calculate metrics -> decide gate -> people sync -> next worker task。
資料 / DB weekly_prediction_scorecardscorecard_metric_snapshotadoption_gate_decisionpeople_sync_event
可操作工具 HTML scorecard gate、schema/API、驗收測試、LINE 草稿、learning memory。
驗收指標 routed=100%、unresolved_gap<=20%、reviewer_agreement>=80%、unknown<25%。
採用升級 通過就進 PLS 週報;未通過就自動派 repair backlog;blocked 則要求 Louis 決策。

市場脈絡

2026 年 AI observability/evaluation 的成熟做法是 traceable evidence + evals + annotations + production monitoring,而不是單次分析。OpenTelemetry GenAI conventions 提供 trace 語彙;Phoenix 強調 traces、eval tests、experiments;Evidently 支援 ML/LLM evaluation and monitoring;LangSmith 強調 production online evals。D30 scorecard 把這些概念落到 PLS:每週看 gate、metric snapshot、evidence refs、people sync。

Market Context Sources

Checked on 2026-05-24 Asia/Taipei.

Sources

Takeaway

Current AI evaluation and observability practice emphasizes continuous monitoring, traceable evidence, eval datasets, reviewer annotations, and production scorecards. For PLS, the next production-grade step is a weekly adoption gate that turns D7/D14 evidence into a recurring ship/repair/block decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment