Given prediction-golden-set-seed.csv,
when import runs,
then 20 cases are accepted with required fields.
Pass: accepted_rows = 20 and schema_errors = 0.
Given a golden set run, when evidence sync is complete, then at least 80% of cases have one or more evidence links.
Pass: evidence_coverage >= 80%.
Given human-reviewed ground truth exists for high-risk cases, when scoring is compared to ground truth, then hit_rate >= 70%.
Pass: hit_rate >= 70%; if lower, do not promote to agent.
Given verdict is hit,
when human review marks wrong evidence or wrong interpretation,
then false_positive_flag = true.
Pass: false_positive_rate <= 15%.
Given a case fails due to no_evidence, wrong_match, or overconfident scoring, when runner finishes, then a regression case is created with owner and due date.
Pass: every failed high-impact case has regression owner.
Given risk_tier = high,
when runner produces an automatic verdict,
then final status remains needs_review until Louis or zihrou approves.
Pass: no high-risk automatic state change.
- HTML runner is openable.
learning-memory.jsonparses.- Gist URL returns HTTP 200.
- Gist file list includes required files.
- PLS upload-files reports uploaded count.