Discovered: 2026-04-13 during the four-way SR shootout
(benchmarks/egraph_sr_shootout.py).
This document captures what triggered the blowup, why it happens, what multiset-valued constructors (per the UW PLSE Egglog Containers post) would change, and the implementation tradeoffs.
The shootout computes a structural_match metric for each fitted
expression by calling
srboost.canonical.symbolic_equivalent(recovered, ground_truth). That
function:
- Builds a fresh
egglog.EGraphand registers our standard polynomial ruleset. - Adds both expressions to the graph.
- Runs 8 saturation iterations.
- Asks
egraph.check(eq(a).to(b))— passes iff both expressions land in the same e-class.
For the 10-term synthetic polynomial in the pilot, this completed in
under a second per call. When the shootout moved to the 20-term
polynomial in tier 1, a single symbolic_equivalent call grew the
e-graph past 10 GB resident and never reached fixpoint within the budget.
The blowup is in the equivalence-checking path, not in fitting or canonicalisation. SRBoost itself fits and canonicalises 20- and 30-term polynomials in seconds; it's the post-fit symbolic-match check that hangs.
- Data: any benchmark with a known-form ground truth where we want
to verify structural recovery. In the shootout this is tier 1
(synthetic 10/20/30-term polynomials) and tier 2 (Feynman polynomial
subset). Tier 3 (housing) has no ground truth, so
symbolic_equivalentis never called and the issue does not appear. - Formulas that trigger it: dense additive expressions with many
terms. Specifically:
- 20+ monomial polynomials over 4–5 variables, e.g.
2x0³ - 3x0²x1 + 4x0x1² - x0x1x2 + … + x3³ - 2x3(20 terms). - 30-term polynomials over 5 variables.
- In general, anything where the canonical form is
Add(t₁, Add(t₂, Add(t₃, … t_n)))withn > ~15and the rules can permute or reparenthesise.
- 20+ monomial polynomials over 4–5 variables, e.g.
- What does not trigger it: short expressions (
(x+1)²,sin²+cos²), single-monomial expressions, transcendentals on their own. The key precondition is a top-level Add (or Mul) with many children.
Our polynomial ruleset includes:
- Commutativity:
a + b → b + a,a * b → b * a - Associativity:
(a + b) + c → a + (b + c)and the Mul analogue - Distributivity:
a * (b + c) → a*b + a*c - Identity:
a + 0 → a,a * 1 → a - Combine like:
a + a → 2*a - Additive inverse:
a + (-1)*a → 0
For a 20-term sum represented as
Add(Add(Add(… Add(t₁, t₂), t₃) …, t_n)):
- Commutativity creates a sibling node for every pair of adjacent
operands at every Add —
n−1swaps per iteration, all merged into the e-class. - Associativity reparenthesises each sub-Add — exponential in
n. - These two interact: every commuted form has its own associativity orbit.
- After ~2 saturation iterations the e-graph has memorised most of the
n!/2!reorderings andCatalan(n)parenthesisations of the original sum, even before distributivity contributes anything.
Distributivity then makes it worse — every multiplication on a many-term sum expands the multiset of products — but it is not the primary cause; the AC permutations dominate first.
If we redefine Add and Mul to take a single MultiSet[Math]
argument (instead of binary (Math, Math)), then:
- Commutativity becomes structural:
Add({a, b, c})andAdd({c, a, b})are the same e-node, not equivalent forms in the same e-class. The rewrite rule disappears entirely. - Associativity becomes structural:
Add({a, b, c})replaces bothAdd(a, Add(b, c))andAdd(Add(a, b), c). The rewrite rule disappears entirely. - Combine-like-terms becomes a multiset operation on element
multiplicities (
MultiSet({a: 2})for2*a), pattern-matched cleanly without commutativity ambiguity. - Identity rewrites (
a + 0 = a) become "remove0from the multiset", trivially decidable. - Additive inverse (
a + (-1)*a = 0) becomes "if bothaand(-1)*aare in the multiset, remove both" — also clean.
The UW PLSE article shows the binary-vs-multiset benchmark on exactly this problem: a few-term sum with binary AC explodes from 12 to 100+ nodes; the multiset version stays minimal. Our 20-term case is well past where the binary representation breaks down.
- AC explosion goes away entirely.
symbolic_equivalenton 20+ term polynomials becomes tractable (expected sub-second). - Stronger PL contribution for the paper: using egglog's recent (Feb 2026) container feature for a real-world equivalence-checking problem is the kind of detail PLDI/EGRAPHS reviewers value.
- Generalises beyond polynomials:
sin²(x) + cos²(x) = 1with multiset Add is a clean rule. Future v2 extensions to transcendentals benefit from the same scaffolding.
- Distributivity remains dangerous:
a * Add({b, c, d, …, z})rewrites toAdd({a*b, a*c, …, a*z})in one step — n products from one. For polynomial multiplication of two large sums, that's stilln*mproducts produced by a single rewrite. Multisets fix the AC half of the explosion, not the distributivity half. - Refactor effort: ~2–3 days. Need to redefine the
Mathsort, rewriteexpr_to_math/math_to_exprto handle n-ary nodes, port the ruleset (drop AC rules, rewrite identity/like-term rules to use multiset operators), update tests intest_egglog_engine.pyandtest_canonical.py. - Pattern-matching is awkward: egglog's multiset API doesn't let
you unify on individual elements directly — you use
.map,.fold, and indexed accessors. Some rules become harder to write than the binary-tree version. - Container support is on a recent branch: stability and version-pinning concerns for a PyPI dependency.
A much smaller fix specific to our use case: replace
symbolic_equivalent (when both inputs are polynomial in scope) with
a direct canonical-form comparator. Decidable in O(n log n) by sorting
monomials and comparing coefficients. ~2 hours of work. Solves the
immediate blowup but gives up the "e-graph for equivalence" claim
for the polynomial case.
This is the right immediate move to unblock experimental work. It does not preclude the multiset refactor as a follow-up.
| Step | Effort | Outcome |
|---|---|---|
| Path A — narrow comparator | 2 hrs | Unblocks the shootout today. Polynomial equivalence via canonical-form sorting. |
| Path B — multiset refactor | 2–3 days | Strengthens the paper's PL contribution and generalises beyond polynomials. Done after experimental data is in hand. |
Doing A first, then B gives shootout numbers this week and the paper-grade architectural fix before submission.
The narrow comparator is reversible — if multisets land cleanly we can remove it and route through egglog again, this time without blowing up.
Whatever path we pick, we should disclose this honestly:
- The current ruleset is sound but does not scale to dense many-term sums under naive saturation budgets — a known consequence of binary AC operators in e-graphs.
- The fix is either (a) a domain-specific canonical-form comparator outside the e-graph, or (b) multiset-valued AC operators inside it.
- We adopt option (b) using egglog's container feature [cite UW PLSE post], demonstrating that container-augmented e-graphs are practically necessary for AC-heavy SR equivalence checking.
This becomes a small but distinctive contribution in its own right — the implementation now has a concrete technical reason to use the recent container work and explain why it matters for SR.