Skip to content

Instantly share code, notes, and snippets.

@esz135888
Created May 23, 2026 20:53
Show Gist options
  • Select an option

  • Save esz135888/e345d56c1b9d66222d0ab22c6a09e542 to your computer and use it in GitHub Desktop.

Select an option

Save esz135888/e345d56c1b9d66222d0ab22c6a09e542 to your computer and use it in GitHub Desktop.
PLS job 27bf4f7c AI prediction weekly scorecard dashboard implementation

Acceptance Tests

Test 1: Dashboard Card Render

Given a scorecard payload, when PLS loads /ai-prediction/scorecards/:id/card, then the card must render gate header, metric strip, repair backlog, evidence drawer, and next worker action.

Pass:

  • three states render: ship_weekly_scorecard, repair_first, blocked.
  • no text overflows in compact card mode.
  • evidence drawer is collapsed by default.

Test 2: Owner Approval

Given a worker recommendation, when final gate is approved, then only Louis or delegated owner can approve it.

Pass:

  • non-owner approval rejected.
  • final gate decision stores actor and timestamp.
  • metric snapshot hash is stored.

Test 3: Repair Backlog Permissions

Given backlog route updates, when zihrou or iron updates a row, then each person can only update their allowed route type.

Pass:

  • zihrou can update rubric/reviewer items.
  • iron can update source adapter items.
  • worker cannot close route without owner approval.

Test 4: LINE Summary

Given a dashboard card, when LINE summary is generated, then it must be short and action-oriented.

Pass:

  • message includes gate, owner, requested signal, and top backlog owner.
  • message does not include raw evidence JSON.
  • long evidence remains in dashboard/artifact.

Test 5: Next Worker Dispatch

Given final gate, when dispatch-next is called, then the next action must be deterministic.

Pass:

  • ship_weekly_scorecard creates publish/dashboard cadence task.
  • repair_first with source gap creates source adapter repair task for iron.
  • repair_first with rubric gap creates rubric repair task for zihrou.
  • blocked creates manual owner decision or missing data task.

Test 6: Completion Evidence

Given this pack is complete, then durable artifact, uploaded files, production readiness, E2E verification, people sync, learning memory, owner/due/acceptance, and decision record must all exist.

Pass:

  • public artifact URL returns HTTP 200.
  • 8 files uploaded to PLS.
  • learning memory JSON validates.
  • artifact JSON contains required PLS kinds.

Artifact URL or PR

Durable primary artifact:

https://gist.github.com/esz135888/e345d56c1b9d66222d0ab22c6a09e542

Gist id:

e345d56c1b9d66222d0ab22c6a09e542

Verification:

  • Public Gist URL responds with HTTP 200 after redirects.
  • File list includes dashboard HTML, production brief, data model, acceptance tests, decision record, learning memory, market sources, and this artifact record.

Dashboard Data Model / API / Sync / Permissions

Views / Tables

weekly_prediction_scorecard_view

Field Type Purpose
scorecard_id uuid Card id.
project_id uuid PLS project.
week_start date Weekly cadence.
adoption_gate enum ship_weekly_scorecard, repair_first, blocked.
hit_rate / miss_rate / unknown_rate decimal Quality metrics.
routed_non_hit_rate decimal Repair completeness.
unresolved_gap_rate decimal Remaining source/rubric risk.
reviewer_agreement_rate decimal Human agreement.
manual_minutes_saved integer Cost saving estimate.
final_gate_decision_id uuid Owner approval if present.

scorecard_repair_backlog_view

Field Type Purpose
route_id uuid Correction route.
scorecard_id uuid Parent scorecard.
route_type enum source/rubric/prompt/owner follow-up.
owner_user_id uuid iron, zihrou, Louis delegate, or worker.
due_at datetime Repair due date.
severity enum P0/P1/P2.
acceptance_rule text Done definition.
evidence_refs jsonb Source refs.

scorecard_evidence_drawer

Field Type Purpose
scorecard_id uuid Parent scorecard.
calibration_run_id uuid D7 run.
correction_batch_id uuid D14 routes.
metric_snapshot_hash text Audit hash.
source_snapshot_at datetime Evidence boundary.
decision_record_ref text Decision record URL.
artifact_refs jsonb Gist / PLS files.

worker_dispatch_recommendation

Field Type Purpose
id uuid Recommendation id.
scorecard_id uuid Parent scorecard.
recommended_action enum publish_dashboard, repair_source_gap, repair_rubric, rerun_cohort, manual_owner_decision.
target_worker_kind text project_runner, scorecard_improvement, repo_change.
dispatch_prompt text Next worker instruction.
status enum draft, queued, claimed, completed, failed.

API

API Method Purpose
/ai-prediction/scorecards/:id/card GET Read dashboard card payload.
/ai-prediction/scorecards/:id/approve POST Louis approves final gate.
/ai-prediction/scorecards/:id/backlog/:route_id PATCH Update repair backlog status.
/ai-prediction/scorecards/:id/dispatch-next POST Create next worker job from gate.
/ai-prediction/scorecards/:id/line-summary GET Generate short LINE summary only.

Payload Sketch

{
  "scorecard_id": "uuid",
  "adoption_gate": "repair_first",
  "metrics": {
    "unknown_rate": 0.20,
    "routed_non_hit_rate": 1.0,
    "reviewer_agreement_rate": 0.86,
    "manual_minutes_saved": 180
  },
  "top_repair_backlog": [
    {
      "route_type": "source_adapter_gap",
      "owner": "iron",
      "due": "2026-06-21",
      "acceptance": "extractor version + source timestamp present"
    }
  ],
  "next_worker_action": "repair_source_gap"
}

Permissions / Audit

  • Louis can approve final gate and publish card.
  • zihrou can update rubric/reviewer backlog items.
  • iron can update source adapter backlog items.
  • PLS worker can recommend dispatch but cannot approve final gate.
  • Each state transition stores actor, timestamp, metric snapshot hash, calibration run id, correction batch id, and decision record ref.
  • LINE sync is generated from /line-summary; it must not include raw evidence payloads.

PLS Backend Flow

  1. Load weekly_prediction_scorecard_view.
  2. Join repair backlog and evidence drawer.
  3. Render card state.
  4. If final gate missing, ask Louis for approval.
  5. If approved and gate is ship, publish weekly card.
  6. If repair/block, call dispatch-next with deterministic worker prompt.

Decision Record: PLS Weekly Scorecard Dashboard Card

Date: 2026-05-24
Status: Recommended
Owner: Louis
Reviewers: zihrou / iron

Problem

The project has D7 measurement, D14 repair routing, and D30 adoption gate. The missing piece is an implementation-ready PLS backend card that turns the gate into a weekly operating surface.

Options Considered

Option A: Continue with static production packs

Pros: low implementation risk.
Cons: the system remains document-driven and does not enter an operating cadence.

Option B: Build a standalone dashboard outside PLS

Pros: fast visual demo.
Cons: weak adoption because approval, LINE sync, worker dispatch, and audit live in PLS.

Option C: Define PLS backend card contract

Pros: connects scorecard, approval, people sync, repair backlog, and next worker dispatch in the system where work already flows.
Cons: requires backend implementation after this pack.

Recommendation

Choose Option C. It is the correct bridge from artifact chain to production workflow.

Adoption State

Recommended for PLS backend implementation. The next worker should either implement the card in repo or create the backend issue/PR once repo path is available.

Landing Path

  1. Add scorecard card payload API.
  2. Render dashboard card in PLS backend.
  3. Enforce owner approval.
  4. Add LINE summary endpoint.
  5. Wire final gate to next worker dispatch.

Feedback Needed If Rejected

The reviewer must specify:

  • PLS backend repo is unavailable.
  • scorecard data is not persisted.
  • owner approval should happen outside PLS.
  • LINE sync should be handled by a different service.
  • this chain should pause before implementation.
{
"job_id": "27bf4f7c-9165-426d-a09c-a610407ad706",
"project_topic": "AI prediction verification module for signals and action-item evidence",
"current_artifact": "PLS Weekly Scorecard Dashboard Implementation Pack",
"previous_artifacts": [
"D7 Calibration Run Control Tower",
"D14 Correction Router and Weekly Scorecard Pack",
"D30 Weekly Scorecard Adoption Gate"
],
"owner": "Louis",
"reviewers": ["zihrou", "iron"],
"due": "2026-06-21",
"market_learning": [
"AI observability systems become valuable when eval results are tied to production monitoring and action workflows.",
"A PLS-native card is more adoptable than a standalone dashboard because approval, people sync, and worker dispatch already live in PLS.",
"The next improvement should be implementation or PR, not another static artifact pack."
],
"next_worker_rule": {
"if_repo_available": "Implement PLS backend weekly scorecard card and API using this contract.",
"if_repo_unavailable": "Create a repo_change/github_pr task with this artifact as spec.",
"if_scorecard_data_missing": "Backfill weekly_prediction_scorecard_view from D7/D14/D30 artifacts.",
"if_gate_ship": "Publish card and schedule weekly re-run.",
"if_gate_repair_or_blocked": "Dispatch source/rubric repair or owner decision task."
},
"acceptance_gate": {
"card_states_render": ["ship_weekly_scorecard", "repair_first", "blocked"],
"owner_approval_required": true,
"line_summary_short_only": true,
"next_worker_dispatch_deterministic": true
},
"do_not_repeat": [
"Do not create another static D7/D14/D30 pack.",
"Do not build a standalone dashboard that bypasses PLS approval and dispatch.",
"Do not push raw evidence into LINE."
]
}
<!doctype html>
<html lang="zh-Hant">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>PLS AI 預測驗證 Weekly Scorecard Dashboard</title>
<style>
:root{--ink:#16212b;--muted:#607080;--line:#d8dee7;--bg:#f6f8fb;--panel:#fff;--blue:#2457d6;--green:#14805a;--amber:#a35f00;--red:#b42318;--soft:#eef2f7}
*{box-sizing:border-box}body{margin:0;background:var(--bg);color:var(--ink);font-family:Inter,ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFont,"Segoe UI",sans-serif;line-height:1.5}
header{background:#fff;border-bottom:1px solid var(--line);padding:22px clamp(18px,4vw,48px)}main{padding:22px clamp(18px,4vw,48px) 48px;display:grid;gap:16px}
h1{margin:0;font-size:clamp(24px,3vw,36px);letter-spacing:0}h2{margin:0 0 12px;font-size:18px}h3{margin:0 0 8px;font-size:15px}p{margin:0 0 10px}
.sub{color:var(--muted);max-width:1120px;margin-top:8px}.tag{display:inline-flex;align-items:center;height:24px;padding:0 8px;border:1px solid var(--line);border-radius:999px;background:#fbfcfe;color:var(--muted);font-size:12px;margin-right:6px}
.grid{display:grid;gap:16px}.cols4{grid-template-columns:repeat(4,minmax(0,1fr))}.cols3{grid-template-columns:repeat(3,minmax(0,1fr))}.cols2{grid-template-columns:repeat(2,minmax(0,1fr))}
.panel{background:var(--panel);border:1px solid var(--line);border-radius:8px;padding:16px}.metric{min-height:112px;display:flex;flex-direction:column;justify-content:space-between}.label{font-size:13px;color:var(--muted)}.value{font-size:30px;font-weight:760}.ok{color:var(--green)}.warn{color:var(--amber)}.stop{color:var(--red)}
table{width:100%;border-collapse:collapse;font-size:13px}th,td{border-bottom:1px solid var(--line);padding:10px 8px;text-align:left;vertical-align:top}th{background:#fbfcfe;color:var(--muted)}code{background:var(--soft);border-radius:4px;padding:2px 5px;font-size:12px}
.bar{height:10px;background:var(--soft);border-radius:999px;overflow:hidden}.fill{height:100%;background:var(--blue);width:72%}.fill.green{background:var(--green);width:86%}.fill.amber{background:var(--amber);width:20%}.fill.red{background:var(--red);width:14%}
.state{border-left:4px solid var(--blue);padding-left:12px}.state.warn{border-color:var(--amber)}.state.stop{border-color:var(--red)}
ul{padding-left:18px;margin:0}li{margin:6px 0}@media(max-width:980px){.cols4,.cols3,.cols2{grid-template-columns:1fr}}
</style>
</head>
<body>
<header>
<h1>PLS Weekly Scorecard Dashboard</h1>
<p class="sub">把 D30 adoption gate 變成 PLS 後台可實作的週卡:管理層看 gate,reviewers 看待修 backlog,worker 看下一輪派工。LINE 只推短摘要,證據留在後台與 artifact。</p>
<p><span class="tag">Owner: Louis</span><span class="tag">Review: zihrou / iron</span><span class="tag">Due: 2026-06-21</span><span class="tag">Target: backend-ready card</span></p>
</header>
<main>
<section class="grid cols4">
<div class="panel metric"><span class="label">Adoption Gate</span><span class="value ok">ship</span><span class="label">or repair_first / blocked</span></div>
<div class="panel metric"><span class="label">Unknown Rate</span><span class="value warn">20%</span><span class="label"><div class="bar"><div class="fill amber"></div></div></span></div>
<div class="panel metric"><span class="label">Routed Non-hit</span><span class="value ok">100%</span><span class="label"><div class="bar"><div class="fill green"></div></div></span></div>
<div class="panel metric"><span class="label">Open Repair</span><span class="value stop">4</span><span class="label">source/rubric routes</span></div>
</section>
<section class="grid cols2">
<div class="panel">
<h2>Dashboard Card Layout</h2>
<table>
<tr><th>Zone</th><th>Purpose</th><th>User Action</th></tr>
<tr><td>Gate Header</td><td>show ship / repair_first / blocked</td><td>Louis approves final gate</td></tr>
<tr><td>Metric Strip</td><td>unknown, routed, agreement, savings</td><td>spot threshold break</td></tr>
<tr><td>Top Repair Backlog</td><td>rank source/rubric fixes</td><td>zihrou/iron take owner tasks</td></tr>
<tr><td>Evidence Drawer</td><td>run ids, route ids, sample refs</td><td>audit without LINE noise</td></tr>
<tr><td>Next Worker Action</td><td>dispatch repair/dashboard/rerun</td><td>PLS creates next job</td></tr>
</table>
</div>
<div class="panel">
<h2>Purpose-to-Purpose E2E</h2>
<ul>
<li>原始目的:驗證 AI review prediction 是否命中。</li>
<li>產出物:PLS weekly scorecard card + API spec + worker dispatch rules。</li>
<li>人採用:Louis 看 gate,zihrou/iron 看 backlog,worker 接 next action。</li>
<li>指標改善:降低 unknown、縮短 repair cycle、減少人工 review、提高可採用 prediction pattern。</li>
</ul>
</div>
</section>
<section class="panel">
<h2>Card States</h2>
<div class="grid cols3">
<div class="state"><h3>ship_weekly_scorecard</h3><p>所有 threshold 通過。顯示 publish 按鈕、LINE 草稿、下週 re-run 排程。</p></div>
<div class="state warn"><h3>repair_first</h3><p>資料可用但 threshold 未過。顯示 top repair backlog,禁止 publish。</p></div>
<div class="state stop"><h3>blocked</h3><p>run/source/owner decision 缺失。只顯示缺口、owner、due,不允許週報出街。</p></div>
</div>
</section>
<section class="grid cols2">
<div class="panel">
<h2>Top Repair Backlog</h2>
<table>
<tr><th>Route</th><th>Owner</th><th>Due</th><th>Acceptance</th></tr>
<tr><td>source_adapter_gap: action item stale</td><td>iron</td><td>2026-06-21</td><td>extractor version + source timestamp present</td></tr>
<tr><td>rubric_fix: ambiguous success criteria</td><td>zihrou</td><td>2026-06-21</td><td>10-case historical recheck passes</td></tr>
<tr><td>model_prompt_fix: weak-signal overprediction</td><td>PLS worker</td><td>2026-06-21</td><td>before/after eval delta positive</td></tr>
</table>
</div>
<div class="panel">
<h2>Value / Money Path</h2>
<ul>
<li>營收:把可信 prediction pattern 擴到業務、客訴、特殊申請等高價流程。</li>
<li>省成本:dashboard 直接顯示例外 backlog,降低 reviewer 逐案追問。</li>
<li>降風險:blocked state 防止缺證據的 AI 成果進入管理週報。</li>
<li>釋放人力:next worker action 自動派 repair/rerun/dashboard,不靠人腦續接。</li>
</ul>
</div>
</section>
<section class="grid cols3">
<div class="panel"><h2>Data / API</h2><p>使用 <code>weekly_prediction_scorecard_view</code>、<code>scorecard_repair_backlog_view</code>、<code>scorecard_evidence_drawer</code>、<code>worker_dispatch_recommendation</code>。API: <code>GET /scorecards/:id/card</code>、<code>POST /scorecards/:id/approve</code>、<code>POST /scorecards/:id/dispatch-next</code>。</p></div>
<div class="panel"><h2>Permissions / Audit</h2><p>Louis 才能 approve gate;zihrou/iron 只能更新各自 backlog 狀態;worker 只能 recommend dispatch。所有操作保留 actor、metric snapshot hash、run ids、route ids。</p></div>
<div class="panel"><h2>Human Capability</h2><p>把「看一堆檔案」變成「看 gate + backlog + next action」。人做判斷,AI 做證據整理、派工和節奏延續。</p></div>
</section>
<section class="panel">
<h2>LINE 草稿</h2>
<p>本週 AI 預測驗證 scorecard 已可進 PLS 後台卡片。Louis 請選 ship/repair/block;若 repair_first,zihrou 看 rubric backlog、iron 看 source adapter backlog;若 ship,PLS 下週自動 re-run 並回報 gate 變化。</p>
</section>
</main>
</body>
</html>

PLS AI 預測驗證 Weekly Scorecard Dashboard Implementation Pack

場景

前一輪已完成 D30 Weekly Scorecard Adoption Gate。本輪把 gate 推進成 PLS 後台可以實作的 dashboard/card 規格:卡片狀態、資料視圖、API、權限、稽核、LINE 短同步與 next worker dispatch。

Owner: Louis
Reviewers: zihrou / iron
Due: 2026-06-21
Primary artifact: pls-weekly-scorecard-dashboard.html

30 天發展路徑

時點 成果 驗收
D1 dashboard card layout 與 API contract 定稿。 card zones、states、API、permission matrix 完整。
D7 PLS 後台可接 weekly_prediction_scorecard_view sample payload 可 render ship/repair/block。
D14 LINE sync 與 next worker dispatch 串上 gate。 LINE 只推短摘要;worker 能自動接 repair/rerun/dashboard。
D30 Weekly operating view 進入固定節奏。 Louis 每週可 approve gate;zihrou/iron 依 backlog 修復。

目的到目的 E2E

原始目的:驗證 AI review prediction 是否命中。
本輪目的:把驗證結果變成 PLS 後台可運作的週卡,讓人能直接決策與派工。

E2E:

  1. D7/D14/D30 evidence 進入 scorecard views。
  2. PLS dashboard card 顯示 gate、metric strip、top repair backlog、evidence drawer、next worker action。
  3. Louis approve ship/repair/block。
  4. zihrou/iron 只收到自己需要處理的 backlog。
  5. worker 依 final gate 建立 dashboard publish、repair、rerun 或 blocked repair 任務。

價值 / 錢路徑

  • 營收:可信 AI prediction pattern 可被推到高價流程,像業務、客訴、特殊申請與專案治理。
  • 省成本:把 reviewer 逐案追問變成例外管理與 top backlog。
  • 降風險:blocked state 阻止缺 evidence 的 AI 結論變成管理週報。
  • 提高轉換:Louis 每週只需在 card 上 approve gate,降低採用摩擦。
  • 釋放人力:next worker dispatch 讓 PLS 自動接續,不再靠人整理上下文。

提升人的能力

  • Louis:從讀多份 artifact 變成看 gate 和 top risk。
  • zihrou:從臨時審查變成 rubric backlog owner。
  • iron:從被動補資料變成 source adapter health owner。
  • PLS worker:從產出檔案變成操作週節奏與修復派工。

Solution Stack

內容
脈絡框架 D7/D14/D30 evidence -> dashboard card -> people sync -> next worker dispatch。
作業流程 calculate card -> approve gate -> sync people -> dispatch next -> audit trail。
資料 / DB weekly_prediction_scorecard_viewscorecard_repair_backlog_viewscorecard_evidence_drawerworker_dispatch_recommendation
可操作工具 HTML dashboard artifact、API spec、acceptance tests、decision record、learning memory。
驗收指標 card 三態可 render、owner approval enforced、LINE short sync、dispatch action deterministic。
採用升級 通過後進 PLS 後台實作;未通過則派 source/rubric repair,不再產 static packs。

市場脈絡

AI observability/evaluation 的成熟工具鏈都把 trace、eval、annotation、experiment、monitoring 接成閉環。這次的 PLS dashboard 對應該成熟做法:不是新增漂亮圖表,而是把 evidence 與 scorecard 變成 release/adoption workflow。

Market Context Sources

Checked on 2026-05-24 Asia/Taipei.

Sources

Takeaway

Comparable mature systems connect evaluation traces, reviewer annotations, monitoring, and production action. For PLS, this means the scorecard should live inside the backend workflow with approval, audit, LINE summary, and next worker dispatch, not as a disconnected visual report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment