calculate.md

Layer Effectiveness Metrics

L0 Bouncer Effectiveness

Metric	What it measures	Calculation
Coverage	% handled without escalation	(total - needs_l1) / total
Accuracy	When confident, is it correct?	correct / confident_predictions
False confidence	Confident but wrong	confident & incorrect / confident
Escalation rate	% sent to L1	needs_l1 / total

Goal: High coverage + high accuracy = fast and reliable

L1 Analyst Effectiveness

Goal: High correction rate = worth the compute cost

L2 Gauntlet Effectiveness

Metric	What it measures	Calculation
Final arbiter accuracy	Correct on hardest cases	l2_correct / l2_samples
Tiebreaker value	Resolved disagreements correctly	resolved_correct / disagreements
Policy detection	Which policies catch what	violations per policy

Goal: High accuracy on edge cases L0+L1 couldn't handle

Cascade System Effectiveness

Metric	What it measures	Formula
Overall accuracy	Final prediction correct	all_correct / total
Efficiency	Compute saved	l0_only / total
Cost ratio	Compute vs accuracy tradeoff	(l01 + l110 + l2*100) / total
Layer value	Each layer's contribution	See below

Layer Value Breakdown

Total correct: 950/1050

Breakdown:

L0 confident & correct: 800 (76.2%) ← Fast path
L0 confident & wrong: 50 (4.8%) ← L0 failures
L1 corrected L0: 60 (5.7%) ← L1 value
L1 confirmed L0: 100 (9.5%) ← L1 validation
L2 final decision: 40 (3.8%) ← L2 value

=== CASCADE EFFECTIVENESS ===

L0 Bouncer: Coverage: 85% (handled without escalation) Accuracy when confident: 94% False confidence rate: 6%

L1 Analyst: Samples received: 150 Correction rate: 40% (fixed 60 L0 mistakes) Agreement rate: 60% Value added: +12% accuracy on uncertain samples

L2 Gauntlet: Samples received: 40 Final accuracy: 85%

Overall: Accuracy: 90.5% Efficiency: 85% handled by L0 alone Effective cost: 1.35x (vs 3x if all went to L2)

bigsnarfdude/calculate.md