Single source of truth for how RotatingRoom defines, validates, and tracks metrics. Referenced by: CLAUDE.md,
/launch,/writing-plans,/reviewing-plans,/launch-metrics,/monitor-launchesLast updated: 2026-04-13
Every metric must pass a validation chain before we trust it:
Ultimate Outcome (what we actually care about)
↓ validated by production data analysis
Proxy Metric (what we can measure frequently)
↓ mapped to
Features (what we built to move this metric)
↓ embedded in
Plans → Reviews → /launch → Dashboard → Digest
We don't pick metrics because they're easy to collect. We pick them because we've validated that they predict the outcome we care about.
Every non-trivial feature must answer BEFORE implementation begins:
- Why are we doing this? What business value does this create? Which ultimate outcome does it serve?
- How would we know if it worked? What observable change would prove success — ideally attributable to this specific intervention?
- What will we measure, on what cadence? Concrete metric, data source (Stripe/SQL/PostHog/GSC — the system of record, never approximations), frequency, evidence-based target.
- How do we know this proxy actually predicts the outcome? Cite the validation analysis, or flag that a Phase 0 validation is needed before the metric can be trusted.
- Don't pick metrics because they're easy to collect (Rollbar errors as proxy for behavioral bugs)
- Don't hardcode business logic when the source of truth has real data (hardcoded plan prices vs Stripe API)
- Don't track metrics without a decision rule ("if X doesn't move by Y, we will Z")
- Don't add measurement as an afterthought — it's part of planning, not a post-launch task
- Don't assume a proxy works because it sounds reasonable — validate with data first
Top-level SaaS/marketplace metrics that characterize overall business health. These rarely change — they're the vital signs.
Cadence: Collected daily, reviewed weekly, targets revisited annually.
| # | Metric | Precise definition | Current | Source | SQL/API |
|---|---|---|---|---|---|
| 1 | MRR | Sum of (unit_amount / interval_months) for all active Stripe subscriptions, minus coupon discounts. Paginated (2,000+ subs). | $36,085 | Stripe API | sgetAll("/subscriptions?status=active") → compute monthly equivalent per sub |
| 2 | Active paid listings | Count of distinct rooms.id where rooms.status='active' AND the room's user has a Stripe subscription with stripe_status='active'. |
1,985 | SQL | SELECT count(DISTINCT r.id) FROM rooms r JOIN subscriptions s ON r.user_id=s.user_id WHERE r.status='active' AND s.stripe_status='active' |
| 3 | Messages/day (7d avg) | count(*) FROM chat_messages WHERE created_at >= now() - INTERVAL '7 days' divided by 7. Seasonal context noted (historical range: 229-888/day). |
~687/day | SQL | See definition |
| 4 | 30-day listing retention | Of rooms that were status='active' with an active subscription 30 days ago, what % still have status='active' with an active subscription today. Per-room, not per-user. A room that was rented out and delisted counts as churned (we can't separate "successful exit" from churn currently). |
82.0% | SQL | Retained: rooms.status='active' AND rooms.created_at < now()-30d AND subscription.stripe_status='active'. Churned: rooms.created_at < now()-30d AND subscription canceled AND updated_at >= now()-30d. Rate = retained/(retained+churned). |
| 5 | New signups/month | count(*) FROM users WHERE created_at >= now() - INTERVAL '30 days' |
~2,838 | SQL | See definition |
Tier 1 answers: "Is the business healthy right now?"
These are lagging indicators. You can't plan around them directly, but you watch them to confirm tier-2 interventions are working. Revisit the set annually.
Seasonal normalization for messages/day: Volume has clear seasonality (Thanksgiving week: 229/day, late March: 888/day). For now, track raw 7d rolling average with seasonal context notes in the dashboard. Add year-over-year comparison when 12 months of data available (September 2026).
Higher-level proxies tied to strategic priorities. Each represents an initiative the team is actively intervening on. These roughly align with epics — you CAN plan around "reduce churn" or "grow travel nurse traffic."
Cadence: Full set defined yearly. 3 active per quarter (with targets + decision rules). 3 monitoring (watching, no target, activation triggers defined). Rotate at quarterly review.
Hard rule: 3 active per quarter. A 3-person team can drive 3 things well, not 6.
Structure: Each tier-2 initiative has:
- Primary metric — the one number we're targeting (with Q target and decision rule)
- Sub-metrics — related measures providing context (no target, permanently associated)
- Decision rule — "If [metric] doesn't reach [target] by end of Q, we will [action]"
- Sub-metrics are NOT tier-3 metrics (which are feature-level and ephemeral).
1. Improve Listing Retention 🟢 ACTIVE
| Metric | Precise definition | Current | Q2 Target | |
|---|---|---|---|---|
| Primary | 30-day listing retention rate | Of rooms that were status='active' with active subscription 30 days ago, % still active with active subscription today. Per-room (same room ID). |
82.0% | >85% |
| Sub | Monthly logo churn rate | Subscriptions canceled in 30d / (active + canceled in 30d). Source: Stripe canceled_at. |
24.9% | — |
| Sub | NRR | MRR from customers who existed 30d ago / their MRR then × 100. (Build when Stripe snapshot available.) | ~85-90% est. | — |
| Sub | No-inquiry churn rate | Of rooms churned in last 30d, % that had 0 lifetime conversations. | TBD | — |
Decision rule: If retention doesn't improve by 3+ pts by end of Q2, invest in ChurnKey retention flows and/or launch the messaging strategy experiments. Proxy validation: Directly measures what we care about. NRR is the gold standard SaaS metric but at 25% churn it's noisy and requires Stripe invoice reconstruction — kept as sub-metric. Note: "Retained" = same room still active. Rooms delisted because the lister successfully rented them count as churned (we can't distinguish "success" from "abandonment" currently).
2. Increase Listing Liquidity 🟢 ACTIVE
| Metric | Precise definition | Current | Q2 Target | |
|---|---|---|---|---|
| Primary | Listing liquidity (14d) | count(rooms with ≥1 conversation WHERE conversation.created_at >= now()-14d) / count(all paid rooms) × 100. "Paid rooms" = rooms.status='active' with active Stripe subscription. |
54.2% | >60% |
| Sub | Total lifetime conversations per listing | count(DISTINCT conversations.id) for each room since room creation. Median across all paid rooms. |
TBD | — |
| Sub | Zero-lifetime listings | Paid rooms with 0 conversations ever. Highest-risk group (8.5% retention). | TBD | — |
Decision rule: If liquidity doesn't reach 60% by end of Q2, investigate: demand problem (not enough searchers) or matching problem (searchers can't find the right listings)? Proxy validation: 0 conversations = 8.5% retention. ≥1 = 27.9% (3.3x). Recency matters: last-7-days inquiry = +15.7pt retention lift. (n=2,099 first-time paid subs, Jul 2025–Mar 2026.)
3. Grow Travel Nurse Pipeline 🟢 ACTIVE
| Metric | Precise definition | Current | Q2 Target | |
|---|---|---|---|---|
| Primary | TN users who sent messages/week | Distinct chat_messages.user_id in last 7 days WHERE user has a verification_requests row with category matching '%travel nurse%' or '%nursing%' (case-insensitive). |
~3/wk (range 1-12) | >8/wk (4-week avg) |
| Sub | TN page clicks/week | GSC clicks on URLs containing /travel-nurse |
~10/wk | — |
| Sub | TN keyword impressions/week | GSC impressions for queries matching "travel nurse housing" and related terms | 1,089/wk | — |
| Sub | TN keyword basket avg position | GSC average position for ["travel nurse housing", "travel nurse rooms", "travel nurse rentals"] | #12 | — |
| Sub | Nursing verifications/month | count(*) FROM verification_requests WHERE category LIKE '%nurse%' OR category LIKE '%nursing%' AND created_at >= now()-30d |
23/mo | — |
Decision rule: If 4-week avg TN messagers doesn't reach 8/wk by end of Q2, reassess whether TN is the right acquisition wedge or if we need broader medical professional targeting. Note on noise: Weekly range is 1-12. Accept noise; evaluate on 4-week rolling average. This is an early-stage wedge metric — directional signal matters more than precision.
4. Improve Lister Reply Rate ⚪ MONITORING
| Metric | Precise definition | Current | |
|---|---|---|---|
| Primary | Lister reply rate | Of conversations created in last 30d, % where EXISTS (chat_message FROM owner_id). |
49.8% |
| Sub | Searcher follow-up rate | Of conversations where lister replied, % where searcher sent ≥1 subsequent message. | TBD |
| Sub | Thread completion rate | Conversations with ≥2 messages from each side / total conversations (30d). | ~28% |
Activation trigger: If reply rate drops below 45%, escalate to active with target. Proxy validation: Reply predicts thread completion (0%→40.4%) and searcher re-engagement (37.9%→59.3%).
5. Contain Fraud ⚪ MONITORING
| Metric | Precise definition | Current | |
|---|---|---|---|
| Primary | Searchers who contacted fraudsters/month | count(DISTINCT chat_messages.user_id) where the OTHER user in the conversation has users.blocked=true OR users.is_fraudulent=true, last 30 days. |
37 |
| Sub | Accounts flagged/month | count(*) FROM users WHERE (blocked=true OR is_fraudulent=true) AND updated_at >= now()-30d |
TBD |
| Sub | Median time-to-block (hours) | PERCENTILE_CONT(0.5) of (blocked_at - created_at) for blocked users |
TBD |
Activation trigger: If searchers affected exceeds 60/mo, escalate to active. Measurement note: Lower bound. Revises upward as more accounts flagged. Measure same day monthly.
6. Improve Case Mix ⚪ MONITORING
| Metric | Precise definition | Current | |
|---|---|---|---|
| Primary | Case mix score | Average of: (a) non-monthly % of new paid subs in last 30d, and (b) premium+pro % of new paid subs in last 30d. Denominator = subscriptions WHERE created_at >= now()-30d AND stripe_price != 'rr-free'. |
32.1% |
| Sub | Non-monthly % of new subs | (New annual + quarterly) / total new paid subs | 18.4% |
| Sub | Premium/Pro % of new subs | (New premium + pro) / total new paid subs | 45.8% |
| Sub | Upgrade rate | Existing standard subs that changed to premium stripe_price in period | TBD |
Activation trigger: If case mix score drops below 28%, escalate to active. Key insight: 81.6% of new subs choose monthly. Premium is winning (45.8%) but annual commitment is not (17.1%).
The live union of all KPI metrics associated with active launches. When a launch graduates or gets killed, its tier-3 metrics automatically disappear.
Cadence: Reviewed weekly during launch monitoring. Decisions made when soak period ends (graduate, extend, or kill).
Source: The /launch skill creates these. Each launch entry has metrics[] with source, query, target, and cadence. The kpi-refresh.js script collects them daily.
Structure: Tier-3 metrics are defined using the four questions:
- Why this feature? → which tier-2 initiative does it serve?
- How would we know? → specific observable outcome
- What to measure? → source of truth, not proxy-of-proxy
- Does this proxy predict the outcome? → reference a validated proxy or flag for Phase 0
Examples of current tier-3 metrics:
- Free listing deferred activation: premature activations = 0 (SQL) → serves liquidity (tier 2)
- Edu fraud prevention: Rollbar .edu errors = 0 → serves fraud containment (tier 2)
- LCP optimization: homepage LCP < 2.5s (PSI) → serves organic growth (tier 2)
- Blog migration: blog impressions ≥ pre-migration baseline (GSC) → serves TN pipeline (tier 2)
Graduation: A tier-3 metric graduates when the decision is made on its launch — not on a fixed clock. "We've seen enough to decide" is the criterion, with the soak period as a minimum.
Tier-3 metrics are NOT sub-metrics. Sub-metrics are permanently associated with a tier-2 initiative. Tier-3 metrics come and go with features.
| Activity | When | Who | What happens |
|---|---|---|---|
| Tier-3 review | Weekly | Gaurav + Claude | Review all active launches. Graduate, extend, or kill. Tier-3 metrics update automatically. |
| Tier-2 review | End of quarter | Gaurav + Megan | Review Q performance vs targets. Decide Q+1 priorities. Rotate tier-2 metrics if priorities change. |
| Tier-1 review | Annually | Gaurav | Are these still the right health metrics? Add/remove as the business evolves. |
| New feature | At planning time | Claude (via /writing-plans) | Answer 4 questions. Map to tier-2 initiative. Define tier-3 metrics. |
| Feature launch | At deploy time | Claude (via /launch) | Register tier-3 metrics. Capture baselines. Start collection. |
| Feature graduation | Soak period + decision | Claude (via /monitor-launches) | Graduate → tier-3 metrics disappear. Kill → tier-3 metrics disappear + learning captured. |
These analyses validate which proxies predict business outcomes. Referenced when defining new metrics — use these validated proxies before proposing new ones.
| Proxy | Predicts | Evidence | Analysis details |
|---|---|---|---|
| Listing liquidity ≥1 conv in 14d | Lister retention | 8.5% (0 convos) → 27.9% (1) → 52.2% (11+) | n=2,099 first-time paid subs, Jul 2025–Mar 2026 |
| Lister reply rate | Thread completion + searcher engagement | 0%→40.4% completion; 37.9%→59.3% re-engagement | n=6,722 searchers |
| Median response time | Thread completion by speed | 40.4% at <1h → 18.2% at >48h | Same analysis, bucketed by response time |
| Recency of inquiry | Monthly sub retention | Last 7d: +15.7pt lift. First 7d: +9.0pt | Monthly first-time subs only |
| Annual billing cycle | Revenue durability | 50.7% annual retention vs 12.3% monthly | n=5,448 subs since Jul 2025 |
| Total lifetime conversations | Churn risk | 0 convos: 8.5%. 1-3: 30.4%. 4-10: 41.6%. 11+: 52.2% | n=2,099 first-time paid subs |
| MRR (Stripe) | Revenue health | Direct measure (not proxy) | Stripe API, paginated, with discounts |
| Searchers contacting fraudsters | Trust / harm | Direct measure | 37/mo as of April 2026 |
Before adding a metric that isn't in the table above:
- State the ultimate outcome (retention, revenue, trust, engagement)
- Propose the proxy metric
- Run a SQL/Stripe analysis: "For users with high/low values of this proxy, does the outcome differ significantly?"
- Document: threshold, lift, sample size, time period
- If validated: add to the table above and use it. If not: find a better proxy.
If you can't run the analysis (new feature, no data yet): flag as "unvalidated proxy" in the /launch entry. Schedule a Phase 0 validation after the feature has been live for 30+ days.
| Skill | Reads this guide for | What it does |
|---|---|---|
/writing-plans |
Tier-2 initiative list, four questions, proxy validation | Requires "Measurement Validation" section in plans. References tier-2 initiatives. |
/reviewing-plans |
Dimension 7 criteria, validated proxies | Checks: is the proposed metric validated? Is the source of truth correct? Decision rule defined? |
/launch |
Four questions, tier-2 mapping, tier-3 definition | Registers feature with business_question, proxy_validation, expected_impact_on, baseline_at_launch |
/launch-metrics |
Validated proxy table, proxy validation process | Framework for defining metrics. "Check validated proxies first." |
/monitor-launches |
Tier-3 graduation criteria, baseline comparison | Weekly review: compare current vs baseline. Flag overdue graduations. |
/ready-for-review |
Step 1.8 | Soft warning if PR ships production code without outcome-first measurement. |
| CLAUDE.md | Four questions, tier-2 list (summary) | "Outcome-First Planning" section references this document. |
At the end of each quarter:
## Q[N] Review — [Date]
### Tier-2 Performance
| Initiative | Primary metric | Start of Q | End of Q | Target | Hit? |
|-----------|---------------|-----------|---------|--------|------|
| ... | ... | ... | ... | ... | Yes/No |
### Key learnings
- What worked and why?
- What didn't work and why?
- What surprised us?
### Tier-2 changes for Q[N+1]
- Which initiatives continue? (with same or revised targets)
- Which rotate out? (why — achieved goal, deprioritized, or replaced)
- Which new initiatives come in? (why — new strategic priority, new data)
### Tier-1 check
- Do the tier-1 metrics still represent business health? Any to add/remove?
| Date | Change |
|---|---|
| 2026-04-12 | Created. 3-tier framework, Q2 initiatives, validated proxies, process integration. |
I agree with this plan