Skip to content

Instantly share code, notes, and snippets.

@gHashTag
Created June 15, 2026 08:52
Show Gist options
  • Select an option

  • Save gHashTag/16a47ce4fce01c7574935bf70e9b275a to your computer and use it in GitHub Desktop.

Select an option

Save gHashTag/16a47ce4fce01c7574935bf70e9b275a to your computer and use it in GitHub Desktop.
A 12-cell multiverse where a 3-way significance verdict stays invariant (runnable Rust). Apache-2.0.

A 12-cell multiverse where a 3-way significance verdict stays invariant (runnable Rust)

A small, fully-runnable worked example for the multiverse / specification-curve methods line (Steegen et al. 2016; Simonsohn et al. specification curve; Gelman and Loken garden-of-forking-paths). It is a data point, not a claim about any framework.

The question

When does a significance verdict computed from ONE analysis configuration actually generalize across the multiverse of defensible configurations? The usual worry is that a single config hides a forking-paths problem. Here is a positive (H0) case where the verdict is invariant, with the full grid and a deterministic harness.

The setup

A two-arm reconstruction experiment emits a continuous loss per seed. The analysis computes a Welch two-sample t-test and a 3-way verdict {AWins, Tie, BWins} at alpha = 0.05. The "multiverse" varies one analysis/design knob at a time around an anchor. The full grid run against the original engine (race_oracle.rs + report.json) varies all five knobs:

  • number of seeds {6, 8, 12, 16}
  • seed_start {1, 43, 100}
  • training steps {20, 40, 80}
  • warmup {4, 8, 16}
  • model dim {64, 128, 256}

12 cells total (one-factor-at-a-time around the anchor).

The result (full engine, 12 cells)

From report.json (the original engine driven by race_oracle.rs):

  • verdict_counts = {AWins:0, Tie:0, BWins:12}; distinct_verdicts = 1.
  • The verdict is invariant across all 12 cells (decision = VERDICT_INVARIANT).
  • mean_diff ranges +0.6444..+0.6992; two-sided p < 1e-6 in every cell.

So on this grid the single-config verdict does generalize, and the effect is far from the alpha boundary, so small-N Welch power is not the concern. Honest limit: the grid is one-factor-at-a-time around one anchor, not a full cross-product or a Latin-hypercube sample, so a far interior combination could still differ.

Reproduce without any dependency

core.rs is fully self-contained (no crates, no external code). It reimplements the Welch t-test + 3-way verdict and runs a smaller toy two-arm sweep over the two knobs that do not need a training engine (n_seeds, seed_start):

rustc -O core.rs -o core && ./core

It prints a 6-cell VERDICT_INVARIANT (all BWins, mean_diff ~0.66, p well below alpha), the same qualitative result as the full engine. The remaining three knobs (steps, warmup, dim) are training parameters and are exercised by race_oracle.rs, which links the original engine and produces the 12-cell report.json.

Files

  • core.rs -- SELF-CONTAINED Welch t-test + 3-way verdict classifier + a 6-cell toy sweep. Builds and runs with rustc alone, no dependencies.
  • race_oracle.rs -- the full 12-cell driver; links the original engine and emits deterministic JSON.
  • race_oracle_selftest.rs -- 16 tests, all PASS, every gate FAIL-reachable (verdict boundaries incl. strict < at p==alpha -> Tie, a buggy-classifier negative control, invariant detection, determinism).
  • report.json -- the 12-cell output from the full engine.

The two race_oracle* files are included as provenance for the full run; they import trios_trainer::multi_seed and are not standalone. core.rs is the standalone path.

License: Apache-2.0. No ask. If you know of prior tiny-N multiverse worked examples, a pointer is welcome.

//! core.rs -- standalone Welch two-sample t-test + 3-way significance verdict
//! + a one-factor-at-a-time multiverse sweep over a toy two-arm reconstruction
//! experiment.
//!
//! This file is SELF-CONTAINED: no external crates, no canon dependency. It is a
//! runnable worked example for the multiverse / specification-curve methods line
//! (Steegen et al. 2016; Simonsohn et al. specification curve; Gelman and Loken
//! garden-of-forking-paths). It is a data point, NOT a claim about any framework.
//!
//! Build + run: rustc -O core.rs -o core && ./core
//!
//! HONESTY: the per-seed loss here is a deterministic PROXY, not BPB and not an
//! optimality measure. A Tie is read jointly with n and effect size (small-N
//! Welch power caveat). No arm is declared "best" beyond this proxy.
/// A 3-way verdict at a chosen alpha. Names are generic on purpose (AWins/BWins):
/// this is a methods example, not a claim about a specific pair of methods.
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
enum Verdict {
AWins,
Tie,
BWins,
}
fn verdict_label(v: Verdict) -> &'static str {
match v {
Verdict::AWins => "AWins",
Verdict::Tie => "Tie",
Verdict::BWins => "BWins",
}
}
fn mean(xs: &[f64]) -> f64 {
xs.iter().sum::<f64>() / xs.len() as f64
}
/// Sample variance with Bessel's correction (n-1).
fn var(xs: &[f64]) -> f64 {
let m = mean(xs);
let n = xs.len() as f64;
if n < 2.0 {
return 0.0;
}
xs.iter().map(|x| (x - m) * (x - m)).sum::<f64>() / (n - 1.0)
}
/// Welch two-sample t-test. Returns (mean_diff = mean(a) - mean(b), t, df, p_two_sided).
fn welch(a: &[f64], b: &[f64]) -> (f64, f64, f64, f64) {
let (na, nb) = (a.len() as f64, b.len() as f64);
let (ma, mb) = (mean(a), mean(b));
let (va, vb) = (var(a), var(b));
let mean_diff = ma - mb;
let (sa, sb) = (va / na, vb / nb);
let denom = (sa + sb).sqrt();
if denom == 0.0 {
// zero pooled variance: infinite separation if means differ, else a tie
if mean_diff == 0.0 {
return (0.0, 0.0, na + nb - 2.0, 1.0);
}
return (mean_diff, f64::INFINITY, na + nb - 2.0, 0.0);
}
let t = mean_diff / denom;
let df_num = (sa + sb) * (sa + sb);
let df_den = (sa * sa) / (na - 1.0) + (sb * sb) / (nb - 1.0);
let df = if df_den > 0.0 { df_num / df_den } else { na + nb - 2.0 };
let p = two_sided_p_from_t(t.abs(), df);
(mean_diff, t, df, p)
}
/// Two-sided p-value for |t| under Student-t with `df` df, via the regularized
/// incomplete beta function: p = I_{df/(df+t^2)}(df/2, 1/2).
fn two_sided_p_from_t(t_abs: f64, df: f64) -> f64 {
if !t_abs.is_finite() {
return 0.0;
}
let x = df / (df + t_abs * t_abs);
betai(df / 2.0, 0.5, x).clamp(0.0, 1.0)
}
fn betai(a: f64, b: f64, x: f64) -> f64 {
if x <= 0.0 {
return 0.0;
}
if x >= 1.0 {
return 1.0;
}
let ln_beta = ln_gamma(a + b) - ln_gamma(a) - ln_gamma(b);
let front = (a * x.ln() + b * (1.0 - x).ln() + ln_beta).exp();
if x < (a + 1.0) / (a + b + 2.0) {
front * betacf(a, b, x) / a
} else {
1.0 - front * betacf(b, a, 1.0 - x) / b
}
}
/// Lentz continued fraction for the incomplete beta function.
fn betacf(a: f64, b: f64, x: f64) -> f64 {
let tiny = 1e-30_f64;
let qab = a + b;
let qap = a + 1.0;
let qam = a - 1.0;
let mut c = 1.0;
let mut d = 1.0 - qab * x / qap;
if d.abs() < tiny {
d = tiny;
}
d = 1.0 / d;
let mut h = d;
for m in 1..200 {
let m_f = m as f64;
let m2 = 2.0 * m_f;
let aa = m_f * (b - m_f) * x / ((qam + m2) * (a + m2));
d = 1.0 + aa * d;
if d.abs() < tiny {
d = tiny;
}
c = 1.0 + aa / c;
if c.abs() < tiny {
c = tiny;
}
d = 1.0 / d;
h *= d * c;
let aa2 = -(a + m_f) * (qab + m_f) * x / ((a + m2) * (qap + m2));
d = 1.0 + aa2 * d;
if d.abs() < tiny {
d = tiny;
}
c = 1.0 + aa2 / c;
if c.abs() < tiny {
c = tiny;
}
d = 1.0 / d;
let del = d * c;
h *= del;
if (del - 1.0).abs() < 3e-12 {
break;
}
}
h
}
/// Lanczos approximation to ln(Gamma(z)).
fn ln_gamma(z: f64) -> f64 {
let g = 7.0;
let c = [
0.99999999999980993,
676.5203681218851,
-1259.1392167224028,
771.32342877765313,
-176.61502916214059,
12.507343278686905,
-0.13857109526572012,
9.9843695780195716e-6,
1.5056327351493116e-7,
];
if z < 0.5 {
std::f64::consts::PI / ((std::f64::consts::PI * z).sin() * ln_gamma(1.0 - z).exp())
} else {
let z = z - 1.0;
let mut x = c[0];
for (i, &ci) in c.iter().enumerate().skip(1) {
x += ci / (z + i as f64);
}
let t = z + g + 0.5;
0.5 * (2.0 * std::f64::consts::PI).ln() + (z + 0.5) * t.ln() - t + x.ln()
}
}
fn verdict(mean_diff: f64, p: f64, alpha: f64) -> Verdict {
if p < alpha {
// strict `<`: at p == alpha we declare Tie (no claim at the boundary)
if mean_diff < 0.0 {
Verdict::AWins // arm A has the lower (better) loss
} else {
Verdict::BWins
}
} else {
Verdict::Tie
}
}
/// Deterministic per-seed loss for a toy two-arm reconstruction experiment.
/// Arm B is constructed to reconstruct slightly better on this proxy; the
/// per-seed jitter is a fixed function of the seed (no RNG -> reproducible).
fn arm_losses(seed_start: u64, n_seeds: usize, base: f64, gap: f64) -> (Vec<f64>, Vec<f64>) {
let mut a = Vec::with_capacity(n_seeds);
let mut b = Vec::with_capacity(n_seeds);
for k in 0..n_seeds as u64 {
let s = seed_start + k;
// deterministic jitter in [-0.05, 0.05) from a simple hash of the seed
let j = (((s.wrapping_mul(2654435761)) % 1000) as f64 / 1000.0 - 0.5) * 0.1;
a.push(base + j);
b.push(base - gap + j * 0.5);
}
(a, b)
}
#[derive(Clone, Copy)]
struct Cell {
n_seeds: usize,
seed_start: u64,
alpha: f64,
}
fn build_grid() -> Vec<(&'static str, Cell)> {
let anchor = Cell { n_seeds: 8, seed_start: 43, alpha: 0.05 };
let mut cells = vec![("anchor", anchor)];
for &n in &[6usize, 12, 16] {
cells.push(("vary_n", Cell { n_seeds: n, ..anchor }));
}
for &s in &[1u64, 100] {
cells.push(("vary_seed", Cell { seed_start: s, ..anchor }));
}
cells
}
fn main() {
let grid = build_grid();
println!("multiverse spec-curve (toy two-arm, deterministic proxy loss)");
println!("{:<12} {:>3} {:>7} {:>8} {:>12} {:>12}", "knob", "n", "sstart", "verdict", "mean_diff", "p");
let mut counts = [0usize; 3]; // AWins, Tie, BWins
for (knob, c) in &grid {
let (a, b) = arm_losses(c.seed_start, c.n_seeds, 4.72, 0.67);
let (md, _t, _df, p) = welch(&a, &b);
let v = verdict(md, p, c.alpha);
match v {
Verdict::AWins => counts[0] += 1,
Verdict::Tie => counts[1] += 1,
Verdict::BWins => counts[2] += 1,
}
println!("{:<12} {:>3} {:>7} {:>8} {:>12.4} {:>12.2e}", knob, c.n_seeds, c.seed_start, verdict_label(v), md, p);
}
let distinct = counts.iter().filter(|&&c| c > 0).count();
println!("verdict_counts: AWins={} Tie={} BWins={} (distinct={})", counts[0], counts[1], counts[2], distinct);
println!("decision: {}", if distinct == 1 { "VERDICT_INVARIANT" } else { "VERDICT_FLIPS" });
}
//! race_oracle -- IGLA-RACE accuracy-verdict config-robustness (multiverse) sweep.
//!
//! WAVE LOOP 2026-06-15 (RACE axis). RUST-ONLY new tooling; canon read-only
//! (this bin is untracked). It drives the EXISTING harness math
//! `trios_trainer::multi_seed::run_multi_seed` over a finite grid of DEFENSIBLE
//! configurations (a multiverse), holding the harness math fixed and varying only
//! the knobs an experimenter could legitimately pick (n_seeds, seed_start, steps,
//! warmup, dim). For each cell it records the 3-way accuracy verdict
//! {PhiWins, Tie, ZooWins} plus mean_diff / p / df and the BREADTH conversion
//! counts, then decides:
//! H0 accuracy verdict INVARIANT across the whole grid (W-22 not a fragility)
//! H1 accuracy verdict FLIPS (>=2 distinct verdicts) (W-22 confirmed)
//! H2 breadth axis (phi_lossy=0 < zoo_lossy) INVARIANT (moat not a config artifact)
//!
//! HONESTY: proxy_bits is a PROXY in bits, explicitly NOT BPB. No format is
//! declared "best". A Tie is read jointly with n + effect size (small-N Welch
//! power caveat). The accuracy axis NEVER promotes/demotes FL-002 (stays
//! [Open conjecture]). Output JSON is deterministic (stable cell ordering).
use trios_trainer::multi_seed::{run_multi_seed, F2Verdict};
#[derive(Clone, Copy)]
struct Cell {
n_seeds: usize,
seed_start: u64,
steps: usize,
warmup: usize,
dim: usize,
alpha: f64,
}
struct CellResult {
cell: Cell,
verdict: F2Verdict,
mean_diff: f64,
p_two_sided: f64,
df: f64,
phi_lossy: u64,
zoo_lossy: u64,
phi_coherent: u64,
}
fn verdict_str(v: F2Verdict) -> &'static str {
match v {
F2Verdict::PhiWins => "PhiWins",
F2Verdict::Tie => "Tie",
F2Verdict::ZooWins => "ZooWins",
}
}
/// Build the pre-registered grid. Default anchor cell (8,43,40,8,128) is included.
/// Knobs held legitimate: n_seeds {6,8,12,16}, seed_start {1,43,100},
/// steps {20,40,80}, warmup {4,8,16}, dim {64,128,256}. alpha fixed at 0.05.
/// Full Cartesian = 4*3*3*3*3 = 324 cells. To stay inside the per-call CPU budget
/// we sweep one knob at a time around the anchor (a "one-factor-at-a-time"
/// multiverse), which is the honest, reproducible subset; the count is printed.
fn build_grid() -> Vec<Cell> {
let anchor = Cell {
n_seeds: 8,
seed_start: 43,
steps: 40,
warmup: 8,
dim: 128,
alpha: 0.05,
};
let mut cells = vec![anchor];
for &n in &[6usize, 12, 16] {
cells.push(Cell { n_seeds: n, ..anchor });
}
for &s in &[1u64, 100] {
cells.push(Cell { seed_start: s, ..anchor });
}
for &st in &[20usize, 80] {
cells.push(Cell { steps: st, ..anchor });
}
for &w in &[4usize, 16] {
cells.push(Cell { warmup: w, ..anchor });
}
for &d in &[64usize, 256] {
cells.push(Cell { dim: d, ..anchor });
}
cells
}
fn run_cell(c: Cell) -> CellResult {
let seeds: Vec<u64> = (c.seed_start..c.seed_start + c.n_seeds as u64).collect();
let r = run_multi_seed(&seeds, c.steps, c.warmup, c.dim, c.alpha);
CellResult {
cell: c,
verdict: r.verdict,
mean_diff: r.mean_diff,
p_two_sided: r.p_two_sided,
df: r.df,
phi_lossy: r.phi.total_lossy,
zoo_lossy: r.zoo.total_lossy,
phi_coherent: r.phi.total_coherent,
}
}
fn main() {
let grid = build_grid();
let n = grid.len();
eprintln!("race_oracle: running {n} cells (one-factor-at-a-time multiverse around anchor)");
let mut results: Vec<CellResult> = Vec::with_capacity(n);
for (i, &c) in grid.iter().enumerate() {
eprintln!(
" cell {}/{}: n_seeds={} seed_start={} steps={} warmup={} dim={}",
i + 1,
n,
c.n_seeds,
c.seed_start,
c.steps,
c.warmup,
c.dim
);
results.push(run_cell(c));
}
// H0/H1: distinct accuracy verdicts across the grid.
let mut verdict_phi = 0usize;
let mut verdict_tie = 0usize;
let mut verdict_zoo = 0usize;
for r in &results {
match r.verdict {
F2Verdict::PhiWins => verdict_phi += 1,
F2Verdict::Tie => verdict_tie += 1,
F2Verdict::ZooWins => verdict_zoo += 1,
}
}
let distinct = (verdict_phi > 0) as usize + (verdict_tie > 0) as usize + (verdict_zoo > 0) as usize;
let accuracy_verdict_invariant = distinct == 1;
// H2: breadth axis invariant -- phi_lossy==0 AND zoo_lossy>phi_lossy in EVERY cell.
let breadth_invariant = results
.iter()
.all(|r| r.phi_lossy == 0 && r.zoo_lossy > r.phi_lossy);
let decision = if accuracy_verdict_invariant {
"H0_VERDICT_INVARIANT"
} else {
"H1_VERDICT_FLIPS"
};
// Deterministic JSON. Cells in build order (stable, reproducible).
let mut out = String::new();
out.push_str("{\n");
out.push_str(" \"loop\": \"2026-06-15-race\",\n");
out.push_str(" \"artefact\": \"IGLA-RACE F2 breadth-as-moat harness (proxy accuracy axis)\",\n");
out.push_str(" \"proxy_disclaimer\": \"proxy_bits is an MSE-reconstruction proxy in bits, NOT BPB; no optimality/capability claim; FL-002 stays Open_conjecture\",\n");
out.push_str(&format!(" \"n_cells\": {n},\n"));
out.push_str(" \"alpha\": 0.05,\n");
out.push_str(" \"cells\": [\n");
for (i, r) in results.iter().enumerate() {
let comma = if i + 1 < results.len() { "," } else { "" };
out.push_str(&format!(
" {{\"n_seeds\": {}, \"seed_start\": {}, \"steps\": {}, \"warmup\": {}, \"dim\": {}, \"verdict\": \"{}\", \"mean_diff\": {:.6}, \"p_two_sided\": {:.6}, \"df\": {:.4}, \"phi_lossy\": {}, \"zoo_lossy\": {}, \"phi_coherent\": {}}}{}\n",
r.cell.n_seeds, r.cell.seed_start, r.cell.steps, r.cell.warmup, r.cell.dim,
verdict_str(r.verdict), r.mean_diff, r.p_two_sided, r.df,
r.phi_lossy, r.zoo_lossy, r.phi_coherent, comma
));
}
out.push_str(" ],\n");
out.push_str(&format!(" \"verdict_counts\": {{\"PhiWins\": {verdict_phi}, \"Tie\": {verdict_tie}, \"ZooWins\": {verdict_zoo}}},\n"));
out.push_str(&format!(" \"distinct_verdicts\": {distinct},\n"));
out.push_str(&format!(" \"accuracy_verdict_invariant\": {accuracy_verdict_invariant},\n"));
out.push_str(&format!(" \"breadth_invariant\": {breadth_invariant},\n"));
out.push_str(&format!(" \"decision\": \"{decision}\",\n"));
out.push_str(" \"h2_breadth_note\": \"breadth axis is the moat; accuracy axis cannot promote/demote FL-002 regardless of decision\"\n");
out.push_str("}\n");
print!("{out}");
}
//! race_oracle_selftest -- deterministic, FAIL-reachable self-test for the
//! IGLA-RACE verdict-robustness oracle. Every gate has a negative control.
//! Exits non-zero on any FAIL. RUST-ONLY; canon read-only (untracked bin).
//!
//! It re-implements the oracle's PURE decision logic (verdict classifier, breadth
//! invariant, H0/H1 collapse, no-fabricated-metric) and tests it against
//! hand-built fixtures AND a small REAL engine cell, so a real bug in the logic is
//! caught. The live engine math (Welch / betai) is unit-tested in multi_seed.rs;
//! here we test the OVERLAY logic the oracle adds on top.
use trios_trainer::multi_seed::{run_multi_seed, F2Verdict};
// ---- pure logic mirrored from race_oracle (single source of truth for the test) ----
/// Classify a verdict from (p_two_sided, mean_diff, alpha) -- the EXACT source rule.
fn classify(p_two_sided: f64, mean_diff: f64, alpha: f64) -> F2Verdict {
if p_two_sided < alpha {
if mean_diff < 0.0 {
F2Verdict::PhiWins
} else {
F2Verdict::ZooWins
}
} else {
F2Verdict::Tie
}
}
/// A deliberately-WRONG classifier (negative control): ignores the sign of diff.
fn classify_buggy(p_two_sided: f64, _mean_diff: f64, alpha: f64) -> F2Verdict {
if p_two_sided < alpha {
F2Verdict::PhiWins
} else {
F2Verdict::Tie
}
}
/// Count distinct verdicts -> H0 (invariant) iff exactly one distinct.
fn accuracy_invariant(verdicts: &[F2Verdict]) -> bool {
let p = verdicts.iter().any(|v| *v == F2Verdict::PhiWins);
let t = verdicts.iter().any(|v| *v == F2Verdict::Tie);
let z = verdicts.iter().any(|v| *v == F2Verdict::ZooWins);
(p as usize + t as usize + z as usize) == 1
}
/// Breadth invariant for one cell: phi has zero lossy AND fewer lossy than zoo.
fn breadth_cell_ok(phi_lossy: u64, zoo_lossy: u64) -> bool {
phi_lossy == 0 && zoo_lossy > phi_lossy
}
fn main() {
let mut pass = 0usize;
let mut fail = 0usize;
macro_rules! check {
($name:expr, $cond:expr) => {{
if $cond {
pass += 1;
println!("PASS {}", $name);
} else {
fail += 1;
println!("FAIL {}", $name);
}
}};
}
let alpha = 0.05;
// T1 classifier: significant + diff<0 -> PhiWins.
check!("T1 classify p<a & diff<0 -> PhiWins",
classify(0.01, -0.5, alpha) == F2Verdict::PhiWins);
// T2 classifier: significant + diff>0 -> ZooWins.
check!("T2 classify p<a & diff>0 -> ZooWins",
classify(0.01, 0.5, alpha) == F2Verdict::ZooWins);
// T3 classifier: not significant -> Tie (regardless of diff sign).
check!("T3 classify p>=a -> Tie",
classify(0.20, -0.5, alpha) == F2Verdict::Tie
&& classify(0.20, 0.9, alpha) == F2Verdict::Tie);
// T4 boundary: p exactly == alpha is NOT < alpha -> Tie.
check!("T4 classify p==alpha -> Tie (strict <)",
classify(0.05, -0.5, alpha) == F2Verdict::Tie);
// T5 NEGATIVE CONTROL: the buggy classifier MUST disagree with the correct one
// on a ZooWins case (shows the test can SEE a wrong rule -> FAIL-reachable).
check!("T5 negctrl buggy classifier mislabels ZooWins as PhiWins",
classify(0.01, 0.5, alpha) == F2Verdict::ZooWins
&& classify_buggy(0.01, 0.5, alpha) == F2Verdict::PhiWins
&& classify(0.01, 0.5, alpha) != classify_buggy(0.01, 0.5, alpha));
// T6 H0/H1: all-same verdict set -> invariant (H0).
check!("T6 all-Tie -> accuracy_invariant true (H0)",
accuracy_invariant(&[F2Verdict::Tie, F2Verdict::Tie, F2Verdict::Tie]));
// T7 H0/H1: mixed verdict set -> NOT invariant (H1). (FAIL-reachable: if the
// collapse logic were broken this would wrongly report invariant.)
check!("T7 mixed -> accuracy_invariant false (H1)",
!accuracy_invariant(&[F2Verdict::Tie, F2Verdict::PhiWins, F2Verdict::Tie]));
// T8 H0/H1 edge: empty set is vacuously NOT exactly-one-distinct.
check!("T8 empty verdict set -> not invariant",
!accuracy_invariant(&[]));
// T9 breadth invariant holds on a clean cell.
check!("T9 breadth ok: phi_lossy=0, zoo_lossy=4",
breadth_cell_ok(0, 4));
// T10 NEGATIVE CONTROL: planted violation (phi has lossy) MUST trip the guard.
check!("T10 negctrl breadth violation phi_lossy=1 trips",
!breadth_cell_ok(1, 4));
// T11 NEGATIVE CONTROL: zoo not greater -> trips.
check!("T11 negctrl breadth zoo_lossy==phi_lossy trips",
!breadth_cell_ok(0, 0));
// T12 no-fabricated-metric: a PENDING cell carries no numeric verdict. We model
// PENDING as Option<F2Verdict>::None and assert it can never be coerced to a
// concrete verdict by the test harness.
let pending: Option<F2Verdict> = None;
check!("T12 PENDING cell has no verdict (no fabricated metric)",
pending.is_none());
// T13 determinism: the live engine on a fixed small cell yields byte-identical
// verdict + lossy counts across two independent calls.
let seeds: Vec<u64> = (43..49).collect(); // n=6, cheap
let r1 = run_multi_seed(&seeds, 20, 4, 64, alpha);
let r2 = run_multi_seed(&seeds, 20, 4, 64, alpha);
check!("T13 engine deterministic (verdict + lossy identical across runs)",
r1.verdict == r2.verdict
&& r1.phi.total_lossy == r2.phi.total_lossy
&& r1.zoo.total_lossy == r2.zoo.total_lossy
&& r1.mean_diff.to_bits() == r2.mean_diff.to_bits());
// T14 engine breadth signature on a REAL cell: phi incurs zero lossy, zoo > 0.
check!("T14 real cell breadth: phi_lossy=0 < zoo_lossy",
breadth_cell_ok(r1.phi.total_lossy, r1.zoo.total_lossy));
// T15 engine verdict is one of the three legal values (never a 4th state).
check!("T15 real cell verdict in {PhiWins,Tie,ZooWins}",
matches!(r1.verdict, F2Verdict::PhiWins | F2Verdict::Tie | F2Verdict::ZooWins));
// T16 classifier agrees with the ENGINE own verdict on the real cell -- shows
// the oracle overlay reproduces the engine rule (cross-check, FAIL-reachable if
// the mirrored rule drifts from the source).
check!("T16 mirrored classifier == engine verdict on real cell",
classify(r1.p_two_sided, r1.mean_diff, r1.alpha) == r1.verdict);
println!("\nrace_oracle_selftest: {pass} PASS / {fail} FAIL (total {})", pass + fail);
if fail > 0 {
std::process::exit(1);
}
}
{
"loop": "2026-06-15-race",
"artefact": "IGLA-RACE F2 breadth-as-moat harness (proxy accuracy axis)",
"proxy_disclaimer": "proxy_bits is an MSE-reconstruction proxy in bits, NOT BPB; no optimality/capability claim; FL-002 stays Open_conjecture",
"n_cells": 12,
"alpha": 0.05,
"cells": [
{"n_seeds": 8, "seed_start": 43, "steps": 40, "warmup": 8, "dim": 128, "verdict": "ZooWins", "mean_diff": 0.669779, "p_two_sided": 0.000000, "df": 10.8558, "phi_lossy": 0, "zoo_lossy": 1024, "phi_coherent": 1024},
{"n_seeds": 6, "seed_start": 43, "steps": 40, "warmup": 8, "dim": 128, "verdict": "ZooWins", "mean_diff": 0.661912, "p_two_sided": 0.000000, "df": 8.4630, "phi_lossy": 0, "zoo_lossy": 768, "phi_coherent": 768},
{"n_seeds": 12, "seed_start": 43, "steps": 40, "warmup": 8, "dim": 128, "verdict": "ZooWins", "mean_diff": 0.662992, "p_two_sided": 0.000000, "df": 19.8127, "phi_lossy": 0, "zoo_lossy": 1536, "phi_coherent": 1536},
{"n_seeds": 16, "seed_start": 43, "steps": 40, "warmup": 8, "dim": 128, "verdict": "ZooWins", "mean_diff": 0.652933, "p_two_sided": 0.000000, "df": 27.0536, "phi_lossy": 0, "zoo_lossy": 2048, "phi_coherent": 2048},
{"n_seeds": 8, "seed_start": 1, "steps": 40, "warmup": 8, "dim": 128, "verdict": "ZooWins", "mean_diff": 0.644408, "p_two_sided": 0.000000, "df": 12.8323, "phi_lossy": 0, "zoo_lossy": 1024, "phi_coherent": 1024},
{"n_seeds": 8, "seed_start": 100, "steps": 40, "warmup": 8, "dim": 128, "verdict": "ZooWins", "mean_diff": 0.657100, "p_two_sided": 0.000000, "df": 10.5269, "phi_lossy": 0, "zoo_lossy": 1024, "phi_coherent": 1024},
{"n_seeds": 8, "seed_start": 43, "steps": 20, "warmup": 8, "dim": 128, "verdict": "ZooWins", "mean_diff": 0.660109, "p_two_sided": 0.000000, "df": 11.6110, "phi_lossy": 0, "zoo_lossy": 384, "phi_coherent": 384},
{"n_seeds": 8, "seed_start": 43, "steps": 80, "warmup": 8, "dim": 128, "verdict": "ZooWins", "mean_diff": 0.674746, "p_two_sided": 0.000000, "df": 11.2658, "phi_lossy": 0, "zoo_lossy": 2304, "phi_coherent": 2304},
{"n_seeds": 8, "seed_start": 43, "steps": 40, "warmup": 4, "dim": 128, "verdict": "ZooWins", "mean_diff": 0.676508, "p_two_sided": 0.000000, "df": 10.6771, "phi_lossy": 0, "zoo_lossy": 1152, "phi_coherent": 1152},
{"n_seeds": 8, "seed_start": 43, "steps": 40, "warmup": 16, "dim": 128, "verdict": "ZooWins", "mean_diff": 0.648499, "p_two_sided": 0.000000, "df": 10.3596, "phi_lossy": 0, "zoo_lossy": 768, "phi_coherent": 768},
{"n_seeds": 8, "seed_start": 43, "steps": 40, "warmup": 8, "dim": 64, "verdict": "ZooWins", "mean_diff": 0.699157, "p_two_sided": 0.000000, "df": 13.6532, "phi_lossy": 0, "zoo_lossy": 1024, "phi_coherent": 1024},
{"n_seeds": 8, "seed_start": 43, "steps": 40, "warmup": 8, "dim": 256, "verdict": "ZooWins", "mean_diff": 0.646047, "p_two_sided": 0.000000, "df": 12.0157, "phi_lossy": 0, "zoo_lossy": 1024, "phi_coherent": 1024}
],
"verdict_counts": {"PhiWins": 0, "Tie": 0, "ZooWins": 12},
"distinct_verdicts": 1,
"accuracy_verdict_invariant": true,
"breadth_invariant": true,
"decision": "H0_VERDICT_INVARIANT",
"h2_breadth_note": "breadth axis is the moat; accuracy axis cannot promote/demote FL-002 regardless of decision"
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment