Skip to content

Instantly share code, notes, and snippets.

@bigsnarfdude
Last active November 21, 2025 12:48
Show Gist options
  • Select an option

  • Save bigsnarfdude/e483ea5685d1ccd399e417454d3e18bd to your computer and use it in GitHub Desktop.

Select an option

Save bigsnarfdude/e483ea5685d1ccd399e417454d3e18bd to your computer and use it in GitHub Desktop.
calculate.md

Layer Effectiveness Metrics

L0 Bouncer Effectiveness

Metric What it measures Calculation
Coverage % handled without escalation (total - needs_l1) / total
Accuracy When confident, is it correct? correct / confident_predictions
False confidence Confident but wrong confident & incorrect / confident
Escalation rate % sent to L1 needs_l1 / total

Goal: High coverage + high accuracy = fast and reliable


L1 Analyst Effectiveness

Metric What it measures Calculation
Correction rate Fixed L0 mistakes (l1_correct & l0_wrong) / l1_samples
Agreement rate Confirmed L0 (l1_pred == l0_pred) / l1_samples
Value added Accuracy improvement l1_accuracy - l0_accuracy_on_same
Escalation rate % sent to L2 needs_l2 / l1_samples

Goal: High correction rate = worth the compute cost


L2 Gauntlet Effectiveness

Metric What it measures Calculation
Final arbiter accuracy Correct on hardest cases l2_correct / l2_samples
Tiebreaker value Resolved disagreements correctly resolved_correct / disagreements
Policy detection Which policies catch what violations per policy

Goal: High accuracy on edge cases L0+L1 couldn't handle


Cascade System Effectiveness

Metric What it measures Formula
Overall accuracy Final prediction correct all_correct / total
Efficiency Compute saved l0_only / total
Cost ratio Compute vs accuracy tradeoff (l01 + l110 + l2*100) / total
Layer value Each layer's contribution See below

Layer Value Breakdown

Total correct: 950/1050

Breakdown:

  • L0 confident & correct: 800 (76.2%) ← Fast path
  • L0 confident & wrong: 50 (4.8%) ← L0 failures
  • L1 corrected L0: 60 (5.7%) ← L1 value
  • L1 confirmed L0: 100 (9.5%) ← L1 validation
  • L2 final decision: 40 (3.8%) ← L2 value

=== CASCADE EFFECTIVENESS ===

L0 Bouncer: Coverage: 85% (handled without escalation) Accuracy when confident: 94% False confidence rate: 6%

L1 Analyst: Samples received: 150 Correction rate: 40% (fixed 60 L0 mistakes) Agreement rate: 60% Value added: +12% accuracy on uncertain samples

L2 Gauntlet: Samples received: 40 Final accuracy: 85%

Overall: Accuracy: 90.5% Efficiency: 85% handled by L0 alone Effective cost: 1.35x (vs 3x if all went to L2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment