Every round of this KPI work exposed the same pattern: measurement was bolted on after the fact.
- Features shipped → THEN metrics chosen → THEN discovered they were wrong
- Metrics chosen → THEN validated → THEN discovered they didn't predict outcomes
- Dashboard built → THEN reviewed → THEN discovered it "makes no sense"
- MRR computed → THEN challenged → THEN discovered it was off by $4K
The fix isn't "add a field to the /launch template." The fix is making measurement thinking happen before code is written, not after.
Current text says "start with KPIs." That's circular — you pick KPIs, then build. The new text should force the outcome→proxy→measurement chain:
### Outcome-First Planning (Critical)
Every non-trivial feature must answer BEFORE implementation begins:
1. **What outcome do we care about?** (retention, revenue, trust, marketplace efficiency)
2. **What proxy predicts that outcome?** Cite the validation evidence.
If no evidence exists, the first task is running the analysis — not building the feature.
3. **What will we measure, at what cadence, with what target?**
Source must be identified. Query must be executable. Target must be evidence-based.
4. **What decision will we make based on the result?**
"If metric X doesn't move by Y in Z weeks, we will [specific action]."
If a plan can't answer #2 with evidence, it needs a Phase 0: Proxy Validation.
The 8 validated proxies for RotatingRoom (April 2026):
- Listing liquidity (≥1 inquiry in 14d) → predicts lister retention (19.9% → 33.7%)
- Reply rate → predicts thread completion (0% → 40.4%) + searcher re-engagement
- Median response time → predicts thread completion (40.4% at <1h → 18.2% at >48h)
- MRR (Stripe, with discounts) → direct revenue measure
- Monthly churn (Stripe canceled_at) → direct retention measure
- Annual plan % → predicts revenue durability (50.7% vs 12.3% retention)
- Searchers who contacted fraudsters → direct trust/harm measure
- Organic → signup → message funnel → SEO investment ROIFor Strategic Plans, add a mandatory section BEFORE the implementation phases:
## Phase 0: Measurement Validation (if proxy not already validated)
Before any implementation work:
1. State the ultimate outcome this plan serves
2. Identify the candidate proxy metric
3. Run the SQL/API analysis that validates the proxy predicts the outcome
4. Document: "At threshold X, the outcome changes by Y% (cohort of N, period Z)"
5. If the proxy doesn't validate, STOP — reconsider the plan or find a better proxy
Skip Phase 0 only if: the proxy is one of the 8 already-validated core metrics,
OR the feature is infrastructure with no business outcome to measure.Current Dimension 7 is "Success Criteria" — it checks "is there a metric?" The new version checks "is there a VALIDATED metric?"
### 7. Measurement Validation (upgraded from Success Criteria)
- Does the plan state an ultimate outcome? (not just "improve X" but "retention" or "revenue")
- Is the proposed proxy metric validated against that outcome with data?
- If not validated, does Phase 0 exist to validate before implementation?
- Is the measurement source Stripe/SQL/PostHog (source of truth) or an approximation?
- Could this metric go green while the outcome stays flat? (the false-positive test)
- Is the target evidence-based (from analysis) or a guess?
GAP if: no proxy validation evidence AND no Phase 0 to obtain it.
CONCERN if: proxy is reasonable but not validated (acceptable for small features).
PASS if: proxy validated with data, or feature uses an already-validated core metric.Current template requires metrics[] with source/query/target. New template adds the chain:
{
"business_question": "Q1-Q6",
"ultimate_outcome": "lister retention",
"proxy_metric": "listing liquidity (≥1 inquiry in 14d)",
"proxy_validation": "Analysis 1: 19.9% retention at 0 inquiries → 33.7% at ≥1 (n=2,707, Oct 2025-Mar 2026)",
"metrics": [{ "id": "...", "source": "sql", "query": "...", "target": 60 }],
"decision_rule": "If liquidity doesn't increase from 55.7% to 60% within 30 days, pivot to demand-gen"
}If proxy_validation is empty, the Codex adversarial review (Step 6.5) should flag it as the first challenge.
Don't wait for a "50-feature mapping exercise" — do it at registration time. When /launch registers a feature, it should automatically:
- Check which of the 28 metrics (8 heroes + 20 subs) this feature is expected to move
- Record the baseline value of those metrics at registration time
- At graduation, compare current value to baseline
This turns the mapping from a one-time audit into a continuous signal. Add to the /launch template:
"expected_impact_on": ["listing-liquidity", "reply-rate"],
"baseline_at_launch": {"listing-liquidity": 55.7, "reply-rate": 51.3}The thumbs up/down on sub-metrics is a good signal, but it needs a consumer. Add a monthly process:
On the 1st of each month:
- Review thumbs up/down votes on all 20 sub-metrics
- For any metric with net-negative votes: investigate why it's not useful
- For any metric that hasn't been looked at (no votes): is it still relevant?
- Run the 3-question exercise on any new features shipped that month
- Check: do the 28 metrics still cover the current priorities?
This prevents metric drift — the slow accumulation of metrics nobody looks at.
The mapping exercise revealed that experiment attribution (ABC Pricing, Pro Plan) is NOT a sub-metric problem — it's a PostHog experiment infrastructure gap. Don't try to solve it in the dashboard.
Instead:
- Track experiments in PostHog (already instrumented)
- Show experiment status as annotations on the relevant question tile ("ABC test running, variant B leading by +12%")
- When an experiment concludes, record the result in the feature's
/launchentry as the verdict - The dashboard shows the outcome impact, PostHog shows the attribution
This keeps the dashboard clean (health monitoring) while PostHog handles the attribution layer.
| File | Change | Priority |
|---|---|---|
CLAUDE.md (KPI-First section) |
Replace with Outcome-First Planning | Do now — it's the instruction that guides every session |
.claude/skills/writing-plans/SKILL.md |
Add Phase 0: Measurement Validation | Phase 3a of plan |
.claude/skills/reviewing-plans/SKILL.md |
Upgrade Dimension 7 | Phase 3b |
.claude/skills/launch/SKILL.md |
Add outcome chain + expected_impact_on | Phase 3b |
.claude/skills/launch-metrics/SKILL.md |
Reference the 8 validated proxies | Phase 3c |
kpi-dashboard/index.html |
Experiment annotations on tiles | Phase 2 |
scripts/kpi-refresh.js |
Baseline capture at launch registration | Phase 1 |
Memory: feedback_three_kpi_principles.md |
Already updated | ✅ Done |
The process that produced the right answer was:
- Build something → get feedback that it's wrong
- Analyze WHY it's wrong → discover a deeper principle
- Apply the principle → validate with data
- Review → find gaps → fix → review again
This is exactly the process the KPI system should enforce for features:
- Ship a feature → measure the result
- If the metric doesn't move → ask why (wrong metric? wrong feature? wrong proxy?)
- Validate the proxy → adjust the metric or the strategy
- Repeat
The KPI system and the process for improving it should follow the same loop.