Skip to content

Instantly share code, notes, and snippets.

@jsmorph
Last active April 9, 2026 19:01
Show Gist options
  • Select an option

  • Save jsmorph/62ae62f136ef0b55692a5936740839a8 to your computer and use it in GitHub Desktop.

Select an option

Save jsmorph/62ae62f136ef0b55692a5936740839a8 to your computer and use it in GitHub Desktop.
Some math formalization guidelines based on Armstrong and Kempe

GPT 5.4 compiled these instructions from this blog post by Scott Armstrong about his and Julia Kempe's Lean 4 formalization of De Giorgi–Nash–Moser theory.

Armstrong's recent math formalization notes (as interpreted)

This skill is for formalization projects in mature parts of mathematics where the proofs are long, the infrastructure is heavy, and the interfaces are awkward. The work has to be organized around the structure that the prover will need, not around the order in which a paper explains the subject. The worked example at the end uses a Lean 4 formalization of De Giorgi-Nash-Moser theory, but the instructions apply more broadly.

Fix the Target

Start from a theorem package with fixed mathematical scope. State the hypotheses, the top-level conclusions, and the intermediate dependence chain before you write much code. If these objects remain vague, the missing structure will return later as duplicate lemmas, ad hoc wrappers, and theorem statements that are hard to use.

Choose a package that warrants library work. A serious development should leave behind reusable definitions, transport lemmas, admissibility statements, and interfaces that later arguments can use without reconstruction. An isolated theorem may still be worth checking, but it tests much less of the subject and leaves much less behind for the next theorem.

The endpoint should determine the level of abstraction from the start. If the target theorem repeatedly passes between several kinds of objects, the library needs those passages in a stable form early. If the target theorem uses the same reduction, approximation, or localization device in several places, formalize that device once and make the later proofs call it.

Build the Infrastructure

In advanced formalization, the main work usually lies between the definitions and the headline theorems. Paper proofs often hide that layer inside phrases such as "standard setup," "after localization," or "by a routine reduction." In a prover, the same layer becomes a body of explicit restriction maps, witnesses, approximation lemmas, compatibility statements, and transport results across the formal settings that the argument crosses.

Design this layer so that later theorem statements read naturally. If every serious proof has to fight the same coercion problem, the same representation mismatch, or the same missing admissibility lemma, the defect lies in the architecture. Spend the early effort on the interfaces that carry the subject, because those interfaces determine both the readability of the library and the feasibility of the final proofs.

Hidden work has to be formalized in the form in which later arguments use it. If a proof inserts a cutoff, a truncation, a regularized power, a mollified object, or any other auxiliary construction into a variational statement or a weak formulation, the project needs admissibility lemmas and control statements for that move. Paper proofs often suppress that work because a human reader can reconstruct it quickly, but a prover cannot and should not.

Quantitative arguments require the same discipline. Iterations, bootstrap schemes, recursive estimates, radius sequences, exponent cascades, and cutoff constructions do not tolerate informal bookkeeping in formal code. The exponents, radii, positivity conditions, and compatibility relations have to be written as part of the mathematics itself.

Engineer for Lean

Choose representations that Lean and Mathlib can handle at scale. A mathematically natural abstraction may still elaborate badly, search badly, or trigger coercions through too much structure. Compile time and elaboration behavior therefore belong to the mathematical design, because they determine which definitions and theorem statements remain usable after the first few hundred lemmas.

Organize files and helper lemmas around elaboration behavior rather than around paper exposition. Split files where the elaborator needs relief, and package intermediate results where they stabilize interfaces and cut search space. Keep written notes on the repairs that control elaboration, because the same failures recur and forgetting the fix wastes time.

Most advanced subjects cross several formal domains inside the same proof. A single argument may move from pointwise objects to equivalence classes, from local data to global data, from one coordinate description to another, or from one notion of derivative, integral, or morphism to another. Build those crossings explicitly instead of leaving them to local conversions inside flagship proofs.

Many of the most valuable lemmas in a mature development are interface lemmas rather than headline theorems. They let the main argument move cleanly among the representations it actually uses. If a boundary is crossed constantly, formalize the crossing once and reuse it everywhere.

Use Models With Mathematical Control

Use LLMs to write, refactor, and retry proof code under close mathematical direction. Keep target selection, proof architecture, and critical representation choices under human control, because recovering that structure after the model has drifted is slower than fixing the design up front. A model can move quickly through local proof search and code repair, but it does not replace mathematical judgment about what the library should contain.

Use the model's speed to expose uncertainty rather than to conceal it. Ask it to write down the proof skeleton, surface the blocked interfaces, and record the elaboration repairs that make a file tractable. A development that depends on model output but does not preserve the decisions that shaped that output will be difficult to extend and difficult to debug.

Finish the Library

Plan a cleanup phase for duplicate lemmas, vestigial lemmas, oversized files, unstable naming, and persistent linter noise. Those problems make the library harder to extend, harder to audit, and harder to trust even when the main theorems already compile. Cleanup should merge overlaps, delete dead code, repair file boundaries, and expose the interfaces that the development actually needs.

Use cleanup to diagnose structural failures. When one part of the library accumulates wrappers, duplicate statements, or massive files, the problem usually lies in the underlying organization rather than in presentation. A good cleanup shortens later proofs and stabilizes later extensions, instead of only silencing warnings.

The first successful compile marks the point at which library repair becomes visible. The checker certifies correctness of the current code, but it does not guarantee that the current code is readable, durable, or well organized. A development that stops at the first successful compile usually leaves its hardest maintenance problems for the next theorem.

Worked Example: De Giorgi-Nash-Moser in Lean

Fix the target with the same precision in this example. The package is the weak-solution side of elliptic De Giorgi-Nash-Moser theory in Lean 4 on top of Mathlib for uniformly elliptic divergence-form equations with bounded measurable coefficients, without assuming symmetry. The top-level chain is explicit: local boundedness for subsolutions, weak Harnack for supersolutions, the Harnack inequality for solutions, and interior Holder regularity for solutions.

Choose the level of generality so that the infrastructure cannot be evaded. Here that meant formalizing Sobolev spaces on bounded domains built from weak derivatives, together with density, extension, and chain-rule results, plus weak-formulation machinery for elliptic PDE and functional inequalities such as Poincare, Sobolev-Poincare, and John-Nirenberg. In Armstrong's description, this is the first proof-assistant formalization of Sobolev spaces built from weak derivatives at this level of generality and the first large-scale formalization of a modern PDE regularity theorem in this setting.

Write the hidden work out in full. Truncations and regularized powers of u have to be proved admissible as test functions, and their weak gradients have to be controlled in the form required by the weak formulation. The standard unit-ball reduction has to be implemented through explicit rescaling, extension operators, approximation, shell constructions, and change-of-variables estimates rather than left as a verbal reduction.

Treat the iterations as exact quantitative objects. De Giorgi iteration, Moser iteration for subsolutions and supersolutions, the crossover estimate behind weak Harnack, and the oscillation-decay argument behind Holder regularity each require the exponents, radii, cutoffs, and inequalities to line up exactly. The hidden compatibility conditions therefore have to become named lemmas instead of remaining informal knowledge carried by the author.

Expect prover behavior to shape the mathematics. Large proofs in analysis put real pressure on elaboration, so file structure, helper lemmas, and intermediate definitions have to be chosen with compilation behavior in view. In this project the L^p layer and the Euclidean-space layer produced persistent friction, and EuclideanSpace from Mathlib made elaboration blow up badly enough that the practical instruction is to avoid it for this kind of analysis.

Use models as implementation machinery under close supervision. In the reported workflow, no human hand touched the Lean files directly; the humans supplied detailed proof blueprints, pushed the models through the difficult points, and made the models document elaboration-fixing tricks in markdown for reuse. Under that regime, a project that an earlier estimate put at two to four years for a small experienced Lean team was completed in about two weeks, in a sorry-free and axiom-free development of roughly 56,000 lines beyond Lean and Mathlib.

Reserve time for consolidation even after theorem success. This development still accumulated vestigial lemmas, duplicate lemmas, 739 linter warnings, and several files of roughly 10,000 lines before cleanup. The final days went to code cleanup rather than new mathematics, which is exactly what a project of this kind should expect once the theorems first compile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment