This is an incomplete guide to making a HOL-like proof assistant, with pointers to other projects and technical papers, loosely inspired by Make a Lisp.
This is a learning tool, so clarity is preferred to performance.
The design space for LCF-style proof assistants for higher-order logic is quite large, and has not been systematically studied. We will not engage in a systematic study of the possible variants in design.
But digressions about different ways to design things will be made. The design decisions have consequences later on, and we try to point them out to the reader as they emerge.
This is a draft, so it may be obscure at parts. I apologize. Feedback
is welcome, either left here as a comment, or send me an email
(pqnelson
at gmail
).
The main proof assistants in this space include:
- HOL4 "genetic HOL" descended from LCF, handed down on carved stone tablets from Andrew Gordon
- HOL Light is a minimal HOL written in OCaml, which inspired a generation of HOLs
- HOL Zero is a minimalist HOL
- Candle is a HOL built atop CakeML
- Isabelle/HOL is a HOL built atop the Isabelle engine, which is not relevant for our exploration (although Isabelle/Pure may be interesting for readers who wish to work with intuitionistic higher-order logic)
You might have a language in mind. Technically speaking, any Turing complete language suffices, but it's probably wise to use a statically typed functional programming language (e.g., Standard ML, OCaml, Haskell, Elm, and friends).
I'm just going to assume that your language has typechecking, something like algebraic data types, and the ability to make data constructors private. Every functional programming language listed above satisfies these desiderata.
You may want to also include unit tests, to make sure your code works as expected. This may force you to write a unit testing framework (for Standard ML, at least).
The pseudocode provided will be pidgin Standard ML.
If your language provides some kind of associative array (e.g., Haskell's Data.Map), then you're good to go to the next step.
For other languages, you should implement a table data structure. This may involve implementing an "Order" structure. The specification should look something like:
signature ORD = sig
type t;
val compare : t * t -> order;
val eq : t -> t -> bool;
end;
signature TABLE = sig
structure Key : ORD;
type 'a t;
type 'a Table = 'a t; (* synonym for type readability *)
val empty : 'a Table;
(* [insert] Either insert a new key-value entry into a
table, or if the key is already present overwrite the
entry. *)
val insert : 'a Table -> Key.t -> 'a -> 'a Table;
(* [null] tests if a table is empty or not *)
val null : 'a Table -> bool;
(* [size] The number of entries in the table *)
val size : 'a Table -> int;
(* [member] Tests if the Table contains the key. *)
val member : 'a Table -> Key.t -> bool;
val lookup : 'a Table -> Key.t -> 'a option;
val union : 'a Table -> 'a Table -> 'a Table;
end;
Standard ML could use an association list for initial coding (and later implement something more efficient).
Optional:
- Add more quality-of-life functions (like
delete
to remove an entry,map : ('a -> 'b) -> 'a Table -> 'b Table
to apply a function to the value of each entry,merge_into old new
to merge new bindings into an old table, etc.) - You may want to also implement a
Set
data structure. Again, a list works fine for initial explorations (provided there are no duplicates in the list).
We will have substitutions of various kinds in HOL.
A substitution is a list of pairs. Some languages support custom infix
operators, which allows you to write old |-> new
for a constructor
of a substitution replacing old
with new
.
signature SUBST = sig
type ('a, 'b) t;
(* the empty substitution *)
val empty : ('a, 'b) t;
(* test if the given object appears in the domain of the
substitution *)
val contains : ('a, 'b) t -> 'a -> bool;
(* Extend a substitution with a new entry *)
val add : ('a, 'b) t -> 'a -> 'b -> ('a, 'b) t;
end;
HOL is a simply-typed lambda calculus with one or two primitive types.
However, we want to make everything private, so the user can only construct new HOL Types from the methods we provide.
Design decision 1: How pure do you want to make your HOL?
Haskell and friends have no choice, you need to use a State monad to keep track of the list of defined types.
Standard ML, OCaml, and friends, can use a global ref
variable for
tracking these definitions in a Table
.
This decision will impact how we record a new type definition. The
impure global ref
variable approach could use a function signature
like new_type : string * int -> unit
which adds a new type
constructor and its arity to the global table.
Design decision 2: When you invoke mk_type
, are you going to
check the table of type definitions to see the arity is respected?
For pure functional programming languages, this arity check could occur at another stage (e.g., in a state monad).
Optional
- You might want to add helper functions like predicates
is_var : Type -> bool
andis_type : Type -> bool
- It may be useful to include a
to_string : Type -> string
orserialize : Type -> string
function - You may also want to include functions to "hide" constructing
function types (so the same string is used for function type
operators). For example of some of these functions
mk_fun : Type * Type -> Type
takes a domain and range type, then returns the function type of all functions from the domain to the rangeis_fun : Type -> bool
is a predicate checking the given HOL type is a function typedest_fun : Type -> Type * Type
"unpacks" a function type to produce an ordered pair consisting of its domain and range typesdomain : Type -> Type
gives the domain for a function typerange : Type -> Type
gives the range for a function type
signature TYPE = sig
eqtype Type;
(* Make HOL Types an ORD instance *)
type t = Type;
val compare : Type * Type -> order;
val eq : Type -> Type -> bool;
(* Constructors for HOL Types *)
val mk_type : string * Type list -> Type;
val mk_var : string -> Type;
(* Destructuring functions for HOL Types *)
(* (Destructorators?) *)
val dest_type : Type -> string * Type list;
val dest_var : Type -> string;
(* Operations on types *)
(* [subst] will substitute on type variables only *)
val subst : (Type,Type) Subst.t -> Type -> Type;
end;
The terms of HOL are those of a simply-typed lambda calculus with constants.
Design decision 1: How do we represent the syntax tree for terms? There are a lot of different ways to do so.
Jesper Cocx's 1001 Representations of Syntax with Binding surveys the different ways to design data structures for binders and bound variables.
The specification for the Term
data structure should not depend on
which representation you choose to use. But it is highly recommended
to picks some syntax tree which easily distinguishes bound variables
from free variables!
Note: if you have chosen to use, say, locally nameless variables for
the syntax tree, then mk_abs(v,t)
should replace all free instances
of v
in t
by a bound variable. Also note that
dest_abs(mk_abs(v,t)) = (v,t)
--- i.e., dest_abs
should "undo"
this and replace all bound variables attached to the outermost binder
with free variables.
Design decision 2: Depending on your design decision from step 2, you either will have a state monad for the proof assistant, or you're using global mutable variables.
You'll need another table for constant definitions. For global mutable
variables, you can just add another for the constants. You need to
also include a new_const : string * Type.t -> unit
function to add a
new entry to this table.
If you're using state monads, then when mk_const(s,ty)
is invoked
you need to check that a constant with identifier s
and type ty'
was previously defined (and ty
is an instance of ty'
).
Design decision 3: Support explicit substitutions?
That is to say, add a data constructor LzSub of (Term,Term) subst * Term
to the data type for Term
. This allows us to support "explicit substitutions"
which are lazy.
This may be premature optimization, though.
Optional
- Consider implementing a
to_string : Term -> string
orserialize : Term -> string
to convert a term to some string representation
(* Assuming we have a `structure Type : TYPE` implemented *)
signature TERM = sig
eqtype Term;
(* Terms are an ORD instance *)
type t = Term;
val compare : t * t -> order;
val eq : t -> t -> bool;
(* constructors *)
val mk_var : string * Type.t -> Term;
val mk_const : string * Type.t -> Term;
(* [mk_abs(v,t)] Fails if `v` is not a variable produced
from `mk_var`. *)
val mk_abs : Term * Term -> Term;
(* [mk_app] Fails if the types are not compatible. *)
val mk_app : Term * Term -> Term;
(* destructuring functions *)
val dest_var : Term -> string * Type.t;
val dest_const : Term -> string * Type.t;
val dest_app : Term -> Term * Term;
val dest_abs : Term -> Term * Term;
(* operations *)
val type_of : Term -> Type.t;
(* [aconv] Tests two terms if they are alpha convertible *)
val aconv : Term -> Term -> bool;
(* [inst] Instantiate type variables in a term *)
val inst : (Type.t, Type.t) Subst.t -> Term -> Term;
(* [subst] Substitute term variables in a term *)
val subst : (Term, Term) Subst.t -> Term -> Term;
(* [free_vars] Gives a set of free variables appearing in a
term *)
val free_vars : Term -> Term set;
end;
When we apply a theorem to a goal, we will want to match the formula
of the theorem against the formula for the goal. HOL encodes formulas
as Term
of a Boolean Type
(usually called :bool
in most HOLs).
So towards this end, we need to support matching a "pattern" term (like the statement of a theorem) against an "object" term (like the current goal). We will also need to match a "pattern" type against an "object" type.
This means we will need to modify the signatures for TERM
and
TYPE
to add some more functions.
Given two types T1
and T2
, we will process this by creating a list
of equations we need to solve for, initially these will be [T1]
, [T2]
.
When both lists are empty, we have solved the equations, successfully matching the object type against the pattern type, so we return the substitution we have created.
Otherwise, look at the first element from the pattern list of types.
Case 1: Type Variables. If it is a type variable a
, check if we have an entry in our
substitutions for a
to be replaced by something else tau
. If tau
is equal to the first element of the object list of types, then we're
good and can return our substitution. (If a
is in our set of type
variables we should not replace, then check the first element of the
object list of types is equal to a
and raise an exception if it's different.)
If there is no entry in our substitutions (or set of type variables to not replace), then we check if the first element of the object list
Suppose v::pats
is our list of pattern types we're matching against
ty::obs
as our list of object types.
- If
v |-> ty1
is in our substitutionsigma
orv = ty1
appears in the set of type variables we are not expanding, then check thatty1 = ty
.- If
ty1 = ty
, then recursively checkpats
againstobs
using the substitutionsigma
and set of type variables we're not expanding. - Else raise an exception that we would have to bind
v
to two different types
- If
- Else if
v = ty
, then addv
to the set of type variables we are not expanding, and recursively work on matchingpats
againstobs
withsigma
and the updated set of type variables - Else Add
v |-> ty
to the substitutionsigma
, and recursively matchpats
againstobs
with the updated substitution
Case 2: We have two type applications (App(T1,args1))::pats
and (App(T2,args2))::obs
we're trying to match against.
- If
c1 = c2
, then recursively checkargs1++pats
againstargs2++obs
- Else we have two different type operators we're trying to match against, so raise an exception that we've failed.
Case 3: We are trying to match a type application against a type variable, in which case we've failed.
signature TYPE = sig
(* ...as before, except the new functions and exception: *)
exception Match of string;
(* Apply a substitution to a type *)
val subst: (Type,Type) Subst.t -> Type -> Type;
(* Given a pattern type and an object type, and a
substitution of type variables to types (and a set of type
variables to avoid matching), either
(1) Raise a [Match] exception indicating failure, or
(2) Produce the substitution needed to transform the pattern
type into the object type.
If (S,_) == match pat ob ([], empty_set),
then `subst S pat = ob`.
*)
val match : Type -> Type ->
(Type,Type) Subst.t * Type set ->
(Type,Type) Subst.t * Type set;
end;
We want to match a pattern term against an object term. This is done
by recursively working on the structure of the terms, building a
pair (Term,Term) Subst.t
and (Type,Type) Subst.t
of substitutions.
Suppose we are matching a pattern term p
against an object term
ob
.
Case 1: p
is a free variable.
- If there are bound variables in
ob
attached to a binder not contained inob
, then we raise an exception (because free variables should not involve bound variables out of scope, it leads to horrible complications which are a distraction without use) - Else if there is a term
tm
such thatp |-> tm
appears in the term substitutions, then check thatob
is alpha-convertible totm
.- If it's not alpha-convertible, then raise an exception since we're
trying to substitute
p
with two distinct terms. - Otherwise continue working on the remaining subterms matching each other.
- If it's not alpha-convertible, then raise an exception since we're
trying to substitute
- Else there is no term
tm
such thatp |-> tm
appears in the term substitution, so add it to the term substitution, and recursively work our way through the remaining subterm matchings.
Case 2: p
is a constant term Const(c1,ty1)
and ob
is a
constant term Const(c2,ty2)
.
- If
c1
is not the same asc2
, then raise an exception (these constants mismatch each other, there's no substitution possible that works). - Else extend the type substitution
tySubs
withType.match ty1 ty2 tySubs
and recursively work on matching the remaining subterms.
Case 3: p
is an application App(M,N)
and ob
is an
application App(P,Q)
.
Recursively match M
against P
, and N
against Q
, before working
our way through the remaining subterms.
Case 4: p
is an abstraction Abs(Var(_,ty1),M)
(where Var(_,ty2)
is the bound variable) and ob
is an abstraction Abs(Var(_,ty2),N)
.
- Replace the type substitutions
tySubs
with the result ofType.match ty1 ty2 tySubs
- Then match
M
againstN
before continuing with the remaining match subterms
Case 5: p
is a bound variable and ob
is a bound variable.
- If
p
andob
are both bound to the same binder, then continue with matching the remaining subterms - Else raise an exception that we're trying to match different bound variables.
Case 6: For all other p
and ob
situations, we should raise an
exception since we're trying to match nonmatchable terms.
We may have a term substitution replacing a term variable x
of type T1
with a term variable x
of type T2
, where T1
is an instance of
T2
upon applying the type substitution to it. In this case, we
should discard that particular substitution.
Come to think of it, we should apply type substitutions to the types of the term variables appearing in the term substitution.
There are one or two primitive types in a HOL proof assistant (bool
for formulas, and ind
for nonlogical constants).
There are two different sets of primitive constants for a HOL proof
assistant besides the function type constructor (which I'll write as a => b
to distinguish it from a function type in our programming language a -> b
):
{= : 'a => 'a => bool, CHOOSE : ('a => bool) => 'a}
, just polymorphic equality and Hilbert's choice operator{imp : bool => bool => bool, all : ('a => bool) => bool}
which consists of logical implication and a universal quantifier.
We can define all other connectives out of these.
You need to pick one (or both) set of primitive constants to implement.
Isabelle/Pure uses minimal HOL and adds equality to the second set of primitives.
HOL Light takes the first set of primitives.
If you're working with intuitionistic higher-order logic, then you cannot take the Hilbert choice operator (since that gives you classical logic).
The kernel combines the preceding steps to encode an abstract thm
type whose public-facing "constructors" are the inference rules for
the logical system.
The inference rules depend on the choice of primitives from the previous steps. They will become functions of the kernel module.
HOL Light (and later the OpenTheory logical kernel) offers the following rules (which are intuitionistic and assumes equality has been derived or given as primitive):
The last two rules govern how we define constants and terms. Constants
are defined as c = tm
.
Then we define a type T_new
as those terms of type T_old
satisfying a predicate. The "data constructor" for the new term is
given by abs
and the "destructuring function" rep
gives us the
underlying element of T_old
.
We also need to give the list of type variables
Again, depending on the design decisions made earlier (are you using a state monad?), our hand is forced on the design of the kernel.
For people making a "purely functional" kernel, you will need to make
the kernel into a state monad. For Standard ML programmers, this means
adding a type state
to the specification (which will look something
like {type_constants : (string * int) list, term_constants : (string * Type.t) list, axioms : thm list, definitions : (string * thm) list}
). The state will be updated as new definitions and axioms
are added. The initial state will define the primitive types and basic
constants. The inference rules will not return a thm
but state -> thm * state
,
and we'll compose everything together using bind
operators.
Magnus O. Myreen, Scott Owens, and Ramana Kumar's "Steps Towards Verified Implementations of HOL Light" sketches out what a monadic purely functional HOL kernel looks like.
The kernel has the following specification:
signature KERNEL = sig
type thm;
val dest_thm : thm -> Term.t list * Term.t;
val concl : thm -> Term.t;
val hyp : thm -> Term.t list;
(* inference rules *)
val refl : Term.t -> thm;
val assume : Term.t -> thm;
val eqMp : thm -> thm -> thm;
val absThm : Term.t -> thm -> thm;
val appThm : thm -> thm -> thm;
val deductAntisym : thm -> thm -> thm;
val termSubst : (Term.t,Term.t) Subst.t -> thm -> thm;
val typeSubst : (Type.t,Type.t) Subst.t -> thm -> thm;
val betaConv : Term.t -> Term.t -> thm;
(* defineConst "c" rhs *)
val defineConst : string -> Term.t -> thm;
val defineTypeOp : { abs : string
, rep : string
, tyvars : string list} ->
thm ->
thm * thm;
(* danger! *)
val new_axiom : Term.t -> thm;
end;
You will need to include at least the following axiom concerning the type of individuals:
This is the axiom of infinity, stating that ind
is infinite.
The other axiom (consistent with intuitionistic HOL) is eta conversion
The OpenTheory/src/Thm.sig specification sketches out the type signatures for one HOL kernel.
HOL Light's kernel can be found in hol-light/fusion.ml.
HOL4's kernel specification is FinalThm-sig.sml and is implemented in std-thm.ML.