| youtube | https://www.youtube.com/watch?v=2V2vJOwFprg |
|---|---|
| telegram | https://t.me/llmaitools/3643 |
| gist | https://gist.github.com/gwpl/b5cd2fd4a4d962bdbd636d09ea6d9599/ |
Eve Bodnia — PhD in algebraic topology (UCSB), BA in quantum physics (Berkeley), CERN researcher as a teenager — is building Logical Intelligence around a claim most LLM practitioners will find either heretical or obvious depending on the day: that autoregressive next-token prediction in language space is structurally incapable of genuine reasoning, and that the fix isn't scaling it — it's abandoning the paradigm entirely. Her alternative is Kona, an energy-based reasoning model (EBM) with latent variables that operates in abstract vector space, learns rules about data rather than surface patterns, and treats language as an optional output interface — not the substrate of thought. With Yann LeCun as founding chair of the technical board and Michael Freedman (Fields Medalist, proved the 4-dimensional Poincaré conjecture) as Chief Mathematician, it's not a vaporware pitch.
The core intellectual move is borrowed from physics, not NLP. Energy minimization principles — light traveling the shortest path, systems settling into lowest-energy states — are the optimization backbone. But the differentiator isn't the energy-based framing (LLMs use energy-based principles internally for layer navigation); it's the latent variable component.
Bodnia's analogy from the talk (full video, ~42:08): your brain maintains a latent mental model of the world — compressed rules, not raw data. If you know Eve is a coffee addict, you don't need 10,000 data points of Eve-near-coffee observations. You have one rule, and it generalizes. Kona learns these rules and stores them in latent spaces. Multiple latent spaces (road behavior, weather patterns, physics constraints) can be forced to communicate via what she describes as a mathematically elegant coupling mechanism.
The reasoning happens entirely in this abstract space. LLMs can then be layered on top as a user interface for natural language I/O — but the reasoning engine never touches language. For a robot controlling circuits at microsecond scale, or an energy grid doing real-time pattern forecasting, you skip the LLM layer entirely. The model speaks directly to hardware.
Key specs from the talk and LinkedIn posts: models range from 16M to 200M parameters (contrast: frontier LLMs are in the hundreds of billions). They run on a single H100. The public Kona Sudoku demo served ~20,000 users for approximately $4 total compute, versus an estimated $15,000 for frontier LLMs to solve the same puzzle set (and the LLMs often got it wrong). You can try the demo yourself — it pits Kona against five unnamed frontier LLMs in real time on hard Sudoku.
The finding Bodnia is most excited about: spontaneous knowledge transfer emerged at just 16 million parameters. She expected it might require scaling to billions of parameters — analogous to how disordered systems in condensed matter physics can exhibit phase transitions at scale. Instead, the architecture produced extrapolation behavior almost immediately. She frames this with a piano analogy: learn Bach, then play Mozart without Mozart-specific training, because you've internalized the rules of music not the specific pieces. Then pick up guitar — technically different instrument, but the fundamentals of music transfer.
This is the property that distinguishes interpolation (what LLMs do well — recombining patterns within their training distribution) from extrapolation (generating genuinely novel knowledge outside the training distribution). The manifold hypothesis is the theoretical backdrop here: high-dimensional data lives on low-dimensional manifolds, and the ability to interpolate along those manifolds is key to deep learning generalization. Bodnia's argument is that LLMs interpolate beautifully within language manifolds but cannot jump to new manifolds — EBMs, operating on learned rules in latent space, can.
Beyond Kona, Logical Intelligence has built Aleph, a formal theorem prover that recently hit #1 on PutnamBench — the Putnam being a notoriously difficult college-level math competition, often considered harder than IMO problems. Aleph solved 500 of 660 problems at an average cost of $23 per proof. It also independently reproved Erdős Problems 124 and 481 in Lean, and produced verified proofs of conjectures from DeepMind's formal conjecture list in quantum information and probability.
Per Bodnia's LinkedIn, the upcoming LI-1.0 (the full non-autoregressive, token-free EBM) is planned for 2026 release. When integrated with Aleph, the goal is formally verified code generation at scale — millions of lines of provably correct code for mission-critical systems. This is the "vibe coding" endgame she articulates: move from vibe coding (LLM-generated code that humans debug) to vibe code specification (humans specify constraints in natural language, the system generates mathematically verified implementations).
Bodnia is explicit that EBMs are not competing with LLMs for chatbot or content generation use cases. The targets are domains where probabilistic guessing is unacceptable: chip design (formal verification of circuit layouts — a wrong design going to fab costs millions), surgical robotics (microsecond-scale inference with zero tolerance for hallucination), smart energy grids (real-time pattern analysis with no language component), formally verified code generation, pharmacology (real-time patient-specific analysis combining genetic, biochemical, and language data), and digital asset forecasting.
The self-alignment property is central to the pitch for these domains. LLMs are static after training — if the environment changes, you retrain. Kona, per Bodnia, adapts in real time while remaining bound by human-specified constraints. The constraint-setting remains with humans always; the model is not permitted to deviate from its assigned task boundaries.
The roster is unusually deep for an early-stage AI lab. Vladislav Isenbaev (CRO) won the ICPC World Championship in 2009, then spent years at Facebook, Cruise Automation, and Nuro before joining. Bodnia co-founded with her husband, also an ICPC champion — they came to the US when Facebook recruited him from the ICPC pipeline. The company claims 8 ICPC medals across the team. Michael Freedman, after decades at Microsoft Research founding Station Q (the quantum computing lab), now serves as Chief Mathematician. And LeCun, who just launched AMI Labs pursuing world models for robotics — a philosophically aligned but distinct effort — provides the ecosystem connection. Bodnia notes their models are designed to be compatible with LeCun's JEPA architecture: JEPA handles signal-vs-noise selection, its output becomes Kona's input.
The conversation is laced with references that trace Bodnia's thinking. The manifold hypothesis frames the information loss problem of translating thought into language. Cormac McCarthy's Kekulé Problem essay — exploring how the unconscious mind solves problems that language cannot reach — resonates with the claim that reasoning is pre-linguistic. David Krakauer (President of the Santa Fe Institute) provides the definition of intelligence she cites approvingly: doing more with less. Krakauer's own Generalist interview argues LLMs are "competitive" cognitive artifacts that atrophy thinking rather than augmenting it — a philosophical alignment with Bodnia's position that LLMs mimic rather than reason.
The Grigori Perelman encounter — Bodnia met him as a teenager during a summer in St. Petersburg — shaped her views on ego in science. Perelman's refusal of the Fields Medal because he felt the work was collective, not individual, clearly influenced her framing. The Feynman connection runs through Perfectly Reasonable Deviations from the Beaten Track — Feynman's collected letters, which she's recommended on X — and the Feynman-esque insistence on producing new knowledge cheaply rather than memorizing existing knowledge expensively.
The open questions for technically minded observers: How does Kona perform on standardized reasoning benchmarks beyond Sudoku? The YouTube comments fairly note that the public demo is narrow. Bodnia's response (implicit in the Aleph results) is that PutnamBench #1 is a more serious signal. The LI-1.0 release in 2026 will be the real test — can a token-free, non-autoregressive architecture deliver on the formally-verified-code-at-scale promise? And can the LeCun/JEPA interoperability produce the modular AI ecosystem she envisions?
For the LLM-skeptic camp (or the LLM-realist camp that acknowledges scaling law diminishing returns), this is one of the most technically grounded alternative bets currently being made. Follow the work: Eve on X, Logical Intelligence, and the Kona demo if you want to see the speed differential yourself.