Layer Effectiveness Metrics
L0 Bouncer Effectiveness
| Metric | What it measures | Calculation |
|---|---|---|
| Coverage | % handled without escalation | (total - needs_l1) / total |
| Accuracy | When confident, is it correct? | correct / confident_predictions |
| False confidence | Confident but wrong | confident & incorrect / confident |
| Escalation rate | % sent to L1 | needs_l1 / total |
Goal: High coverage + high accuracy = fast and reliable
L1 Analyst Effectiveness
| Metric | What it measures | Calculation |
|---|---|---|
| Correction rate | Fixed L0 mistakes | (l1_correct & l0_wrong) / l1_samples |
| Agreement rate | Confirmed L0 | (l1_pred == l0_pred) / l1_samples |
| Value added | Accuracy improvement | l1_accuracy - l0_accuracy_on_same |
| Escalation rate | % sent to L2 | needs_l2 / l1_samples |
Goal: High correction rate = worth the compute cost
L2 Gauntlet Effectiveness
| Metric | What it measures | Calculation |
|---|---|---|
| Final arbiter accuracy | Correct on hardest cases | l2_correct / l2_samples |
| Tiebreaker value | Resolved disagreements correctly | resolved_correct / disagreements |
| Policy detection | Which policies catch what | violations per policy |
Goal: High accuracy on edge cases L0+L1 couldn't handle
Cascade System Effectiveness
| Metric | What it measures | Formula |
|---|---|---|
| Overall accuracy | Final prediction correct | all_correct / total |
| Efficiency | Compute saved | l0_only / total |
| Cost ratio | Compute vs accuracy tradeoff | (l01 + l110 + l2*100) / total |
| Layer value | Each layer's contribution | See below |
Layer Value Breakdown
Total correct: 950/1050
Breakdown:
- L0 confident & correct: 800 (76.2%) ← Fast path
- L0 confident & wrong: 50 (4.8%) ← L0 failures
- L1 corrected L0: 60 (5.7%) ← L1 value
- L1 confirmed L0: 100 (9.5%) ← L1 validation
- L2 final decision: 40 (3.8%) ← L2 value
=== CASCADE EFFECTIVENESS ===
L0 Bouncer: Coverage: 85% (handled without escalation) Accuracy when confident: 94% False confidence rate: 6%
L1 Analyst: Samples received: 150 Correction rate: 40% (fixed 60 L0 mistakes) Agreement rate: 60% Value added: +12% accuracy on uncertain samples
L2 Gauntlet: Samples received: 40 Final accuracy: 85%
Overall: Accuracy: 90.5% Efficiency: 85% handled by L0 alone Effective cost: 1.35x (vs 3x if all went to L2)