Skip to content

Instantly share code, notes, and snippets.

@mchav
Created April 13, 2026 19:37
Show Gist options
  • Select an option

  • Save mchav/1cec957725495d6bddb897a9e20902b0 to your computer and use it in GitHub Desktop.

Select an option

Save mchav/1cec957725495d6bddb897a9e20902b0 to your computer and use it in GitHub Desktop.

E-graph Blowup in symbolic_equivalent — Root-Cause Analysis

Discovered: 2026-04-13 during the four-way SR shootout (benchmarks/egraph_sr_shootout.py).

This document captures what triggered the blowup, why it happens, what multiset-valued constructors (per the UW PLSE Egglog Containers post) would change, and the implementation tradeoffs.


1. How it came up

The shootout computes a structural_match metric for each fitted expression by calling srboost.canonical.symbolic_equivalent(recovered, ground_truth). That function:

  1. Builds a fresh egglog.EGraph and registers our standard polynomial ruleset.
  2. Adds both expressions to the graph.
  3. Runs 8 saturation iterations.
  4. Asks egraph.check(eq(a).to(b)) — passes iff both expressions land in the same e-class.

For the 10-term synthetic polynomial in the pilot, this completed in under a second per call. When the shootout moved to the 20-term polynomial in tier 1, a single symbolic_equivalent call grew the e-graph past 10 GB resident and never reached fixpoint within the budget.

The blowup is in the equivalence-checking path, not in fitting or canonicalisation. SRBoost itself fits and canonicalises 20- and 30-term polynomials in seconds; it's the post-fit symbolic-match check that hangs.

2. Data and formulas that trigger it

  • Data: any benchmark with a known-form ground truth where we want to verify structural recovery. In the shootout this is tier 1 (synthetic 10/20/30-term polynomials) and tier 2 (Feynman polynomial subset). Tier 3 (housing) has no ground truth, so symbolic_equivalent is never called and the issue does not appear.
  • Formulas that trigger it: dense additive expressions with many terms. Specifically:
    • 20+ monomial polynomials over 4–5 variables, e.g. 2x0³ - 3x0²x1 + 4x0x1² - x0x1x2 + … + x3³ - 2x3 (20 terms).
    • 30-term polynomials over 5 variables.
    • In general, anything where the canonical form is Add(t₁, Add(t₂, Add(t₃, … t_n))) with n > ~15 and the rules can permute or reparenthesise.
  • What does not trigger it: short expressions ((x+1)², sin²+cos²), single-monomial expressions, transcendentals on their own. The key precondition is a top-level Add (or Mul) with many children.

3. Why the explosion happens

Our polynomial ruleset includes:

  1. Commutativity: a + b → b + a, a * b → b * a
  2. Associativity: (a + b) + c → a + (b + c) and the Mul analogue
  3. Distributivity: a * (b + c) → a*b + a*c
  4. Identity: a + 0 → a, a * 1 → a
  5. Combine like: a + a → 2*a
  6. Additive inverse: a + (-1)*a → 0

For a 20-term sum represented as Add(Add(Add(… Add(t₁, t₂), t₃) …, t_n)):

  • Commutativity creates a sibling node for every pair of adjacent operands at every Add — n−1 swaps per iteration, all merged into the e-class.
  • Associativity reparenthesises each sub-Add — exponential in n.
  • These two interact: every commuted form has its own associativity orbit.
  • After ~2 saturation iterations the e-graph has memorised most of the n!/2! reorderings and Catalan(n) parenthesisations of the original sum, even before distributivity contributes anything.

Distributivity then makes it worse — every multiplication on a many-term sum expands the multiset of products — but it is not the primary cause; the AC permutations dominate first.

4. What multisets would fix

If we redefine Add and Mul to take a single MultiSet[Math] argument (instead of binary (Math, Math)), then:

  1. Commutativity becomes structural: Add({a, b, c}) and Add({c, a, b}) are the same e-node, not equivalent forms in the same e-class. The rewrite rule disappears entirely.
  2. Associativity becomes structural: Add({a, b, c}) replaces both Add(a, Add(b, c)) and Add(Add(a, b), c). The rewrite rule disappears entirely.
  3. Combine-like-terms becomes a multiset operation on element multiplicities (MultiSet({a: 2}) for 2*a), pattern-matched cleanly without commutativity ambiguity.
  4. Identity rewrites (a + 0 = a) become "remove 0 from the multiset", trivially decidable.
  5. Additive inverse (a + (-1)*a = 0) becomes "if both a and (-1)*a are in the multiset, remove both" — also clean.

The UW PLSE article shows the binary-vs-multiset benchmark on exactly this problem: a few-term sum with binary AC explodes from 12 to 100+ nodes; the multiset version stays minimal. Our 20-term case is well past where the binary representation breaks down.

5. Tradeoffs

What multisets buy

  • AC explosion goes away entirely. symbolic_equivalent on 20+ term polynomials becomes tractable (expected sub-second).
  • Stronger PL contribution for the paper: using egglog's recent (Feb 2026) container feature for a real-world equivalence-checking problem is the kind of detail PLDI/EGRAPHS reviewers value.
  • Generalises beyond polynomials: sin²(x) + cos²(x) = 1 with multiset Add is a clean rule. Future v2 extensions to transcendentals benefit from the same scaffolding.

What multisets cost

  • Distributivity remains dangerous: a * Add({b, c, d, …, z}) rewrites to Add({a*b, a*c, …, a*z}) in one step — n products from one. For polynomial multiplication of two large sums, that's still n*m products produced by a single rewrite. Multisets fix the AC half of the explosion, not the distributivity half.
  • Refactor effort: ~2–3 days. Need to redefine the Math sort, rewrite expr_to_math / math_to_expr to handle n-ary nodes, port the ruleset (drop AC rules, rewrite identity/like-term rules to use multiset operators), update tests in test_egglog_engine.py and test_canonical.py.
  • Pattern-matching is awkward: egglog's multiset API doesn't let you unify on individual elements directly — you use .map, .fold, and indexed accessors. Some rules become harder to write than the binary-tree version.
  • Container support is on a recent branch: stability and version-pinning concerns for a PyPI dependency.

6. Alternative — narrow polynomial equivalence comparator

A much smaller fix specific to our use case: replace symbolic_equivalent (when both inputs are polynomial in scope) with a direct canonical-form comparator. Decidable in O(n log n) by sorting monomials and comparing coefficients. ~2 hours of work. Solves the immediate blowup but gives up the "e-graph for equivalence" claim for the polynomial case.

This is the right immediate move to unblock experimental work. It does not preclude the multiset refactor as a follow-up.

7. Recommended path

Step Effort Outcome
Path A — narrow comparator 2 hrs Unblocks the shootout today. Polynomial equivalence via canonical-form sorting.
Path B — multiset refactor 2–3 days Strengthens the paper's PL contribution and generalises beyond polynomials. Done after experimental data is in hand.

Doing A first, then B gives shootout numbers this week and the paper-grade architectural fix before submission.

The narrow comparator is reversible — if multisets land cleanly we can remove it and route through egglog again, this time without blowing up.

8. Implications

Whatever path we pick, we should disclose this honestly:

  • The current ruleset is sound but does not scale to dense many-term sums under naive saturation budgets — a known consequence of binary AC operators in e-graphs.
  • The fix is either (a) a domain-specific canonical-form comparator outside the e-graph, or (b) multiset-valued AC operators inside it.
  • We adopt option (b) using egglog's container feature [cite UW PLSE post], demonstrating that container-augmented e-graphs are practically necessary for AC-heavy SR equivalence checking.

This becomes a small but distinctive contribution in its own right — the implementation now has a concrete technical reason to use the recent container work and explain why it matters for SR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment