Skip to content

Instantly share code, notes, and snippets.

@esz135888
Last active May 24, 2026 02:36
Show Gist options
  • Select an option

  • Save esz135888/f71a4ceb8b83607167440a353141da8c to your computer and use it in GitHub Desktop.

Select an option

Save esz135888/f71a4ceb8b83607167440a353141da8c to your computer and use it in GitHub Desktop.
PLS Tokenmaxxing Value Metric Gate production pack

Acceptance Tests

Production Acceptance

  1. Blocked metric cannot enter performance scorecard.

    • Given metric_key=token_consumption
    • When HR/performance admin attempts to add it to individual scorecard
    • Then API returns 403 blocked_metric_for_performance
  2. Diagnostic-only metrics can appear only as cost context.

    • Given tool_active_days or token_cost_diagnostics
    • When dashboard renders scorecard
    • Then it is shown under diagnostic context, not ranking or bonus fields.
  3. Approved outcome metric requires evidence.

    • Given a proposed task_completion_rate
    • When evidence_url, baseline, owner, formula, quality rubric are missing
    • Then review status is needs_revision.
  4. Gaming signal triggers review.

    • Given token cost increases by more than 2x and outcome metrics do not improve
    • When nightly worker sync runs
    • Then gaming_signals creates cost_spike_no_outcome with severity medium or high.
  5. Exception is time-boxed.

    • Given a blocked metric exception
    • When expires_at is missing
    • Then API rejects the request.
  6. Audit export is complete.

    • Given any metric policy status change
    • When audit report is requested
    • Then actor, reason, previous_state, next_state, timestamp and source are present.

E2E Verification This Round

  • Primary artifact is HTML, not Markdown/JSON.
  • External market inputs include TechRadar, Indeed, SPACE, DORA.
  • Production path includes data model, API, permissions, audit, rollback.
  • People sync includes owner, ask, due date, and escalation copy.
  • Decision record exists.

Artifact URL or PR

Primary artifact: https://gist.github.com/esz135888/f71a4ceb8b83607167440a353141da8c#file-tokenmaxxing-value-metric-gate-html

Public Gist: https://gist.github.com/esz135888/f71a4ceb8b83607167440a353141da8c

Verification commands:

  • curl -I -L -s "https://gist.github.com/esz135888/f71a4ceb8b83607167440a353141da8c#file-tokenmaxxing-value-metric-gate-html" | head -n 8
  • gh gist view f71a4ceb8b83607167440a353141da8c --files

Verification result: primary URL returned HTTP/2 200; public Gist includes 13 files.

Data Model

Tables

metric_registry

Field Type Notes
id uuid Primary key
metric_key text token_consumption, task_completion_rate
metric_name text Human-readable name
policy_status enum blocked, diagnostic_only, approved, pending
allowed_use text[] cost_diagnostic, workflow_scorecard, executive_report
prohibited_use text[] performance_rank, bonus, layoff_screen, individual_leaderboard
rationale text Why this policy exists
owner_id uuid Governance owner
updated_at timestamptz Audit timestamp

metric_gate_reviews

Field Type Notes
id uuid Primary key
proposed_metric_key text Candidate metric
proposed_by uuid Team owner
workflow_id uuid Related workflow
outcome_linkage_score numeric 0-100
gaming_risk_score numeric 0-100
data_quality_score numeric 0-100
decision enum approve, reject, diagnostic_only, needs_revision
decision_reason text Required
approved_by uuid Governance owner

workflow_outcome_snapshots

Field Type Notes
id uuid Primary key
workflow_id uuid Workflow under measurement
baseline_period daterange Before AI baseline
measurement_period daterange Current period
task_completion_delta numeric Completion lift
validated_time_saved_minutes integer Verified sample
quality_score_delta numeric Rubric-based
customer_satisfaction_delta numeric If applicable
evidence_url text Review/sample evidence

token_cost_diagnostics

Field Type Notes
id uuid Primary key
workflow_id uuid Diagnostic target
tokens_in bigint Cost input
tokens_out bigint Cost output
cost_usd numeric Cost estimate
linked_outcome_snapshot_id uuid Nullable
diagnostic_only boolean Must be true

gaming_signals

Field Type Notes
id uuid Primary key
workflow_id uuid Target
signal_type enum cost_spike_no_outcome, automation_loop, low_quality_high_usage
severity enum low, medium, high
evidence_json jsonb Query result
status enum open, investigating, resolved, false_positive

metric_exceptions

Field Type Notes
id uuid Primary key
metric_key text Blocked/diagnostic metric
requested_use text Why exception is requested
risk_tier enum low, medium, high
approved_by uuid Must be governance owner
expires_at timestamptz Required
audit_note text Required

API / Sync

  • POST /metric-gate/proposals: submit candidate metric with workflow, formula, data source, proposed use.
  • POST /metric-gate/reviews: approve/reject/diagnostic-only decision.
  • GET /metric-gate/scorecard?workflow_id=: returns approved outcome metrics plus diagnostic cost context.
  • POST /metric-gate/exceptions: time-boxed exception request.
  • POST /metric-gate/gaming-signals/sync: worker writes suspicious usage/outcome mismatch.

Permissions

  • Team owner: propose metric, view own workflow scorecard.
  • Governance owner: approve/reject metrics, approve exceptions, close high severity gaming signals.
  • Finance owner: view token cost diagnostics, cannot convert diagnostic metric into performance metric.
  • HR/performance admin: consume approved scorecard only; blocked metrics are hidden from performance selector.

Audit / Rollback

Every status change writes audit_events: actor, action, previous_state, next_state, reason, source, created_at. Rollback means reverting metric status to previous approved policy and invalidating scorecard exports created during the invalid window.

Decision Record

Decision

Adopt governance / eval / system spec as the solution shape for the Tokenmaxxing project, with a primary HTML artifact named Tokenmaxxing Value Metric Gate.

Options Considered

  1. Communication only

    • Pros: Fast to tell managers not to use Token metrics.
    • Cons: Does not prevent repeated misuse or metric gaming.
    • Decision: Not enough for production delivery.
  2. Scorecard only

    • Pros: Gives replacement metrics.
    • Cons: Without a gate, blocked metrics can still leak into performance systems.
    • Decision: Useful component, but incomplete alone.
  3. Governance / eval / system spec

    • Pros: Blocks misuse, defines allowed metrics, detects gaming, creates audit trail, can enter PLS Operating Console.
    • Cons: Requires more setup across data, permissions and workflow owners.
    • Decision: Recommended.

Recommendation

Ship the Metric Gate as the operating policy and implementation spec. Token consumption remains diagnostic-only for cost visibility; productivity, budget, promotion, staffing and AI adoption decisions must use verified outcome metrics.

Adoption Status

Ready for D7 pilot with 3 AI workflows.

Landed Path

  1. Add metric registry and gate review tables.
  2. Add blocked metric selector to Operating Console.
  3. Sync token diagnostics separately from performance scorecards.
  4. Run gaming detector nightly.
  5. Export scorecard only after governance approval.

Feedback If Not Adopted

If leadership still wants to track usage, keep it team-level and diagnostic-only, remove individual leaderboards, and require an outcome metric beside every usage metric.

E2E Verification

Verification Plan

  1. Publish primary HTML and appendices to public Gist.
  2. Verify primary URL returns HTTP 200.
  3. Verify Gist includes all 13 required files.
  4. Upload files to PLS deliverable id.
  5. Complete job with stable public artifact URLs.

Expected Artifact

Primary URL: https://gist.github.com/esz135888/f71a4ceb8b83607167440a353141da8c#file-tokenmaxxing-value-metric-gate-html

Evidence

  • Published public Gist: https://gist.github.com/esz135888/f71a4ceb8b83607167440a353141da8c
  • Primary artifact: tokenmaxxing-value-metric-gate.html
  • Verification command: curl -I -L -s "https://gist.github.com/esz135888/f71a4ceb8b83607167440a353141da8c#file-tokenmaxxing-value-metric-gate-html" | head -n 8
  • File list command: gh gist view f71a4ceb8b83607167440a353141da8c --files
  • Result: primary URL returned HTTP/2 200; Gist file list showed all 13 files.

Acceptance Mapping

  • Openable main artifact: tokenmaxxing-value-metric-gate.html.
  • Owner/due/acceptance: in production-brief.md and HTML.
  • E2E evidence: this file.
  • Data/toolbox upgrade path: data-model.md, production-readiness.md.
  • Decision record: decision-record.md.
{
"project": "Tokenmaxxing Value Metric Gate",
"market_learning": [
"Visible AI usage leaderboards can create perverse incentives and inflate token consumption without real productivity.",
"Mature measurement practice uses balanced outcome and reliability metrics rather than single activity proxies.",
"Token cost is still useful as diagnostic data, but it must be separated from performance or adoption ranking."
],
"pls_next_checks": [
"Does any AI adoption dashboard expose individual token or prompt rankings?",
"Can HR/performance workflows select diagnostic-only metrics?",
"Does every AI value metric have baseline, formula, owner, evidence and anti-gaming rule?",
"Are token cost spikes compared with workflow outcome movement?"
],
"assumptions": [
"PLS can add a metric registry and gate review workflow to the Operating Console.",
"Workflow owners can provide outcome evidence for at least three pilot workflows by D7."
],
"next_iteration": "Build Operating Console UI selector and worker gaming detector for blocked metric enforcement."
}

Market Maturity

External Sources

  1. TechRadar, 2026-05-14: Amazon workers are reportedly tokenmaxxing AI platforms to hit usage targets.

  2. Indeed, 2026: Measuring AI productivity should focus on business outcomes rather than token counts or usage proxies.

  3. SPACE framework.

    • URL: https://space-framework.com/
    • Relevance: Mature engineering/productivity measurement balances satisfaction, performance, activity, communication and efficiency.
  4. DORA metrics guide.

PLS Gap

PLS can already generate production packs, but the metric governance pattern needs to be codified so future AI adoption dashboards cannot accidentally promote Token consumption as a performance measure.

This Round Closes

  • Defines blocked / diagnostic-only / approved metric states.
  • Adds data model and API for metric gate reviews.
  • Adds gaming signals and audit.
  • Adds people sync and acceptance criteria.

Remaining Gap

Next round should implement a real Operating Console selector that prevents blocked metrics from being selected in performance views.

People Sync

Target Owners

  • AI governance owner: approve metric policy and exception rules.
  • Finance owner: validate token cost diagnostics as cost-only context.
  • HR/performance policy owner: confirm blocked metrics cannot enter individual performance.
  • Workflow owners: provide baseline and outcome evidence for D7 pilot.
  • PLS platform owner: plan Operating Console selector and worker sync.

LINE Draft

這輪把 Tokenmaxxing 提醒做成 production 版 Metric Gate 了。重點不是禁止看 Token 成本,而是把 Token consumption / prompt count / AI usage leaderboard 全部標為「成本診斷或 blocked」,不能進個人績效、排行、獎金或裁員篩選。

請 D7 前選 3 個 AI workflow 做 pilot:每個 workflow 要補 baseline、完成率、節省時間、品質/缺陷或客戶結果,以及 owner。

如果有人仍想用 Token 用量當採用指標,請回覆用途,我們會走 exception review,預設只能 team-level diagnostic,不做個人排名。

Escalation

若 D7 前沒有 workflow owner 回覆,升級給 AI governance owner 指定 pilot workflow,避免專案停在文件階段。

Production Brief

場景

專案:AI 自建專案:亞馬遜以 Token 消耗量衡量 AI 使用效益導致員工刷指標(Tokenmaxxing)。

本輪定位:把「不要用 Token 當 AI 效益指標」從提醒升級成 production-ready 的 Metric Gate,讓 PLS 或公司內部 Operating Console 可以阻擋錯誤指標進入績效、預算、採用榜單或管理節奏。

D1 / D7 / D14 / D30

  • D1: 建立 blocked metric registry,將 Token consumption、prompt count、AI usage leaderboard 標為 blocked 或 diagnostic-only。
  • D7: 導入 3 個 AI workflow pilot,每個 workflow 都要有 baseline、outcome metric、quality rubric、owner、資料源。
  • D14: 加入 gaming detector,偵測 cost spike without outcome lift、automation loop、低品質高用量。
  • D30: 接入 PLS Operating Console / worker eval,讓指標提案、審批、例外、稽核和 scorecard export 都可追蹤。

Purpose-to-Purpose E2E

原始目的:避免 Tokenmaxxing 讓 AI 採用指標失真。

產出物:Value Metric Gate HTML、資料模型、API/同步規格、驗收、people sync、learning memory、decision record。

人採用:主管用 outcome scorecard 取代 usage leaderboard;員工不用為了看起來 AI-native 去刷無效 prompt。

專案/錢/風險改善:AI 預算流向能提升完成率、節省時間、降低缺陷、改善客戶滿意的 workflow;高成本低成果的使用會被診斷或停損。

Owner / Due / Acceptance

  • Owner: AI governance owner.
  • Supporting owners: Finance owner, workflow owner, HR/performance policy owner, PLS platform owner.
  • Due: D7 metric gate pilot.
  • Acceptance: blocked metric 無法被選為績效或獎懲指標;每個 allowed metric 必須有公式、資料源、baseline、owner、品質門檻、anti-gaming check。

外部市場輸入

  • TechRadar, 2026-05-14: Amazon workers reportedly tokenmaxxing under usage tracking and leaderboards.
  • Indeed, 2026: AI productivity should be measured by business outcomes rather than usage or token counts.
  • SPACE / DORA: comparable maturity patterns emphasize balanced outcome and reliability metrics, not raw activity volume.

Production Readiness

Ready

  • Primary artifact: openable HTML Metric Gate.
  • Required appendices: production brief, data model, acceptance tests, decision record.
  • Governance: blocked, diagnostic-only and approved metric policy.
  • Eval: pass/fail acceptance tests and gaming detector signals.
  • Data path: registry, reviews, outcomes, diagnostics, exceptions, audit events.
  • People path: owner, due date, LINE draft and escalation.

Not Yet Deployed

  • No live PLS database migration was applied in this round.
  • No Operating Console UI code was changed.
  • No nightly worker was deployed.

Production Path

  1. Add schema migration for the tables in data-model.md.
  2. Add API routes for proposal, review, scorecard, exception and gaming sync.
  3. Add UI selector that hides blocked metrics from HR/performance use.
  4. Add nightly worker to compare token diagnostics with outcome snapshots.
  5. Add audit export and rollback job.

Rollback

If a metric is later found invalid, set status to blocked, invalidate related scorecard exports, notify workflow owners, and rerun prior-period scorecards using approved metrics only.

Skill Usage

Selected Skills / Tools

  • using-superpowers: checked session skill discipline before execution.
  • Web search: used to validate current Tokenmaxxing market context and comparable AI/productivity measurement practice.
  • Shell helper: used PLS fixed helper commands for doctor, touch, claim, context/progress retry, upload and complete.
  • apply_patch: used for file creation to keep workspace edits explicit.
  • GitHub Gist via gh: used to publish an openable primary HTML artifact and appendices.
  • curl / gh gist view: used to verify primary artifact URL and file list.

Evidence

  • doctor returned worker health and token presence.
  • claim returned job 4ff76bcb-3c8d-474a-b151-abb372dc83aa.
  • Primary artifact is tokenmaxxing-value-metric-gate.html.
  • Published Gist: https://gist.github.com/esz135888/f71a4ceb8b83607167440a353141da8c
  • Verification command: curl -I -L -s "https://gist.github.com/esz135888/f71a4ceb8b83607167440a353141da8c#file-tokenmaxxing-value-metric-gate-html" | head -n 8.

Why These Tools

The task required a production pack, not only analysis. Browser/web evidence supplied current market context; local file build supplied the actual deliverable; Gist supplied a stable openable artifact; PLS helper performs writeback.

Solution Selection

Selected Types

  • governance
  • eval
  • system
  • project

Why This Combination

Tokenmaxxing is not just a messaging problem. It is an incentive design and measurement governance problem. A communication script can warn people, but it cannot stop a dashboard or HR workflow from using the wrong metric. A document can explain the risk, but it cannot enforce policy. A scorecard helps, but without a gate, bad metrics still slip into rankings.

The right production unit is a Metric Gate:

  • governance defines what is blocked, diagnostic-only, or approved.
  • eval defines pass/fail criteria and gaming detection.
  • system defines schema, API, permissions and audit.
  • project defines D1/D7/D14/D30 adoption path.

Why Not Smaller

Communication or doc-only output would not satisfy the production requirement because the repeated failure mode is system-level metric gaming.

Why Not Larger

A fully deployed multi-tenant system is premature in one heartbeat. The production-ready pack defines the system boundary, data model, API, acceptance and adoption path so the next round can implement or connect it.

<!doctype html>
<html lang="zh-Hant">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Tokenmaxxing Value Metric Gate</title>
<style>
:root{--ink:#14212b;--muted:#5f6f7f;--line:#d8e2eb;--paper:#f7f9fc;--card:#fff;--blue:#2255c7;--green:#08785e;--amber:#9a6200;--red:#b13a22;--violet:#6d3cc2}
*{box-sizing:border-box}body{margin:0;background:var(--paper);color:var(--ink);font-family:Inter,ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFont,"Segoe UI",sans-serif;line-height:1.5}
header{background:#fff;border-bottom:1px solid var(--line);padding:30px clamp(18px,4vw,56px)}main{padding:24px clamp(18px,4vw,56px) 48px}.grid{display:grid;gap:16px}
h1{margin:0 0 12px;font-size:clamp(31px,4vw,54px);line-height:1.04;letter-spacing:0;max-width:1120px}h2{margin:0 0 12px;font-size:22px}h3{margin:0 0 6px;font-size:16px}p{margin-top:0}code{background:#eef3f8;border-radius:4px;padding:1px 5px}
.sub{max-width:1120px;color:var(--muted);font-size:17px}.kpis{grid-template-columns:repeat(4,minmax(0,1fr));margin-top:22px}.two{grid-template-columns:1.05fr .95fr}.three{grid-template-columns:repeat(3,minmax(0,1fr))}.four{grid-template-columns:repeat(4,minmax(0,1fr))}.flow{grid-template-columns:repeat(5,minmax(0,1fr))}
.card{background:var(--card);border:1px solid var(--line);border-radius:8px;padding:18px;box-shadow:0 1px 2px rgba(20,33,43,.04)}.metric{font-size:34px;font-weight:780}.label{color:var(--muted);font-size:13px}.pill{display:inline-flex;border:1px solid var(--line);border-radius:999px;padding:4px 10px;font-size:12px;background:#fff;margin:0 6px 8px 0;white-space:nowrap}
.ok{color:var(--green)}.warn{color:var(--amber)}.bad{color:var(--red)}.info{color:var(--blue)}.gate{border-left:4px solid var(--blue)}.blocked{border-left:4px solid var(--red)}.allowed{border-left:4px solid var(--green)}
table{width:100%;border-collapse:collapse;font-size:14px}th,td{text-align:left;padding:10px;border-bottom:1px solid var(--line);vertical-align:top}th{color:var(--muted);font-size:12px;text-transform:uppercase}.badcell{color:var(--red);font-weight:730}.goodcell{color:var(--green);font-weight:730}.warncell{color:var(--amber);font-weight:730}
.step{border:1px solid var(--line);border-radius:8px;padding:12px;min-height:132px;background:#fbfdff}.step strong{display:block;color:var(--violet);margin-bottom:6px}.source a{color:var(--blue);word-break:break-word}
@media(max-width:960px){.kpis,.two,.three,.four,.flow{grid-template-columns:1fr}h1{font-size:34px}}
</style>
</head>
<body>
<header>
<span class="pill info">PLS production delivery iteration</span><span class="pill ok">Solution: governance / eval / system spec</span>
<h1>Tokenmaxxing Value Metric Gate</h1>
<p class="sub">把「不要用 Token 消耗量衡量 AI 效益」升級成可落地的指標准入系統:任何 AI usage 指標進入績效、預算或採用榜單前,必須通過 outcome linkage、anti-gaming、資料可信度、權限稽核與例外審批。</p>
<section class="grid kpis">
<div class="card"><div class="metric bad">BLOCK</div><div class="label">Token / prompt / leaderboard 不得進個人績效</div></div>
<div class="card"><div class="metric ok">PASS</div><div class="label">任務完成、時間節省、品質、客戶結果</div></div>
<div class="card"><div class="metric warn">3</div><div class="label">異常訊號:cost spike、no outcome lift、automation loop</div></div>
<div class="card"><div class="metric">D30</div><div class="label">接入 PLS Operating Console / worker eval</div></div>
</section>
</header>
<main class="grid">
<section class="grid two">
<div class="card gate">
<h2>本輪 production 任務</h2>
<p>上一版已定義 value scorecard;本輪把它推進成「metric gate」:管理者不能直接把 AI 使用量放進排行或獎懲,必須先證明該指標和真實 workflow outcome 有關,而且有反作弊、權限、稽核、例外與回滾。</p>
<span class="pill">Owner: AI governance owner</span><span class="pill">Due: D7 metric gate pilot</span><span class="pill">Acceptance: blocked metric cannot be selected</span>
</div>
<div class="card">
<h2>人會拿它做什麼</h2>
<p>主管提交 AI adoption metric 時,系統判斷是 <strong>blocked</strong>、<strong>diagnostic-only</strong> 或 <strong>approved outcome metric</strong>;員工看到的不是「誰燒最多 Token」,而是「哪個 workflow 真的省時、提質、降風險」。</p>
</div>
</section>
<section class="card">
<h2>D1 / D7 / D14 / D30</h2>
<div class="grid four">
<div class="card gate"><h3>D1</h3><p>建立 blocked metric registry:Token consumption、prompt count、usage leaderboard 預設不得用於績效。</p></div>
<div class="card gate"><h3>D7</h3><p>完成 metric gate pilot:3 個 AI workflow 都需有 baseline、outcome、quality rubric 與 owner。</p></div>
<div class="card gate"><h3>D14</h3><p>加入 gaming detector:高 token 成本但 outcome 無提升、重複自動化 loop、低品質高用量。</p></div>
<div class="card gate"><h3>D30</h3><p>接入 PLS Operating Console,將 blocked metric selector、exception audit、eval report 寫進後台流程。</p></div>
</div>
</section>
<section class="card">
<h2>Purpose-to-Purpose E2E</h2>
<div class="grid flow">
<div class="step"><strong>原始目的</strong>避免 Tokenmaxxing 讓 AI 採用指標失真。</div>
<div class="step"><strong>產出物</strong>Metric gate、schema、API、驗收、people sync、decision record。</div>
<div class="step"><strong>人採用</strong>主管用 outcome scorecard 決策,員工不用為了榜單刷用量。</div>
<div class="step"><strong>指標改善</strong>AI 成本與完成率、品質、滿意度、風險下降連動。</div>
<div class="step"><strong>錢路徑</strong>停掉高成本低價值用法,把預算導向能省工時或創收的 workflow。</div>
</div>
</section>
<section class="grid two">
<div class="card">
<h2>Metric Gate Rules</h2>
<table>
<thead><tr><th>指標</th><th>狀態</th><th>准入規則</th></tr></thead>
<tbody>
<tr><td>Token consumption</td><td class="badcell">Blocked</td><td>只能作成本診斷,不得進個人績效、排行、獎金、裁員篩選。</td></tr>
<tr><td>Prompt count</td><td class="badcell">Blocked</td><td>可作 troubleshooting,不得當 productivity proxy。</td></tr>
<tr><td>Tool active days</td><td class="warncell">Diagnostic-only</td><td>只代表 adoption exposure,需搭配 outcome 才能進 scorecard。</td></tr>
<tr><td>Task completion rate</td><td class="goodcell">Approved</td><td>需定義 done、review sample、baseline、例外處理。</td></tr>
<tr><td>Validated time saved</td><td class="goodcell">Approved</td><td>需 workflow baseline、抽樣驗證、不可只自報。</td></tr>
<tr><td>Quality / defect rate</td><td class="goodcell">Approved</td><td>需 rubric、reviewer、客戶或內部品質證據。</td></tr>
</tbody>
</table>
</div>
<div class="card">
<h2>Production Stack</h2>
<p><strong>資料表:</strong><code>metric_registry</code>, <code>metric_gate_reviews</code>, <code>workflow_outcome_snapshots</code>, <code>token_cost_diagnostics</code>, <code>gaming_signals</code>, <code>metric_exceptions</code>, <code>audit_events</code>.</p>
<p><strong>API:</strong><code>POST /metric-gate/proposals</code>, <code>POST /metric-gate/reviews</code>, <code>GET /metric-gate/scorecard</code>, <code>POST /metric-gate/exceptions</code>.</p>
<p><strong>權限:</strong>team owner 可提案;governance owner 可批准;finance 只看成本診斷;HR/performance admin 不可選 blocked metric。</p>
<p><strong>稽核:</strong>每次 metric status change、exception approval、scorecard export 都寫入 <code>audit_events</code>。</p>
</div>
</section>
<section class="grid three">
<div class="card allowed"><h2>價值 / 錢路徑</h2><p>防止 AI 預算被「活動量」吞掉,將成本連到完成率、節省時間、缺陷下降、客戶滿意與收入或風險減少。</p></div>
<div class="card allowed"><h2>人的能力提升</h2><p>管理者學會設計不可被輕易 gaming 的 AI 指標;員工知道 AI 採用是為了更好的成果,不是燒 Token。</p></div>
<div class="card allowed"><h2>下一輪升級</h2><p>把 metric gate 做進 Operating Console UI:指標提案、blocked selector、exception approval、gaming alert。</p></div>
</section>
<section class="card source">
<h2>外部市場成熟度輸入</h2>
<p>TechRadar 報導 Amazon 員工因 AI usage leaderboard 與 token 追蹤出現 tokenmaxxing 行為,顯示「用量排行」會產生錯誤誘因:<a href="https://www.techradar.com/pro/amazon-workers-are-apparently-tokenmaxxing-ai-platforms-to-hit-arbitrary-usage-targets">TechRadar, 2026-05-14</a>.</p>
<p>Indeed 的 AI productivity 觀點主張應看 business outcomes,而不是 usage 或 token counts 這類 proxy:<a href="https://www.indeed.com/news/releases/measuring-ai-productivity-business-outcomes?co=US">Indeed, 2026</a>.</p>
<p>SPACE framework 與 DORA metrics 是可比成熟做法:平衡多維工程效能與交付可靠性,而不是單一活動量指標。<a href="https://space-framework.com/">SPACE</a> / <a href="https://dora.dev/guides/dora-metrics/">DORA</a></p>
</section>
</main>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment