This text is an executive summary of an empirical AI research experiment (dated June 4β5, 2026). It evaluates how effectively different Large Language Models can write formal mathematical proofs in Lean 4 (a interactive theorem prover and programming language).
The experimenters used a newly formalized, historically significant mathematical theoremβErdΕs Problem #741(ii)βas a diagnostic "yardstick" to test the boundaries of AI code generation, translation, and reasoning.
Below is a breakdown of the mathematical context, the experiment's design, and the key findings.
-
What is the problem? Originally posed by mathematician Paul ErdΕs, this combinatorics problem asks if there exists an "efficient and unsplittable" subset
$A$ of natural numbers.