WIP: this gist is being updated as it is being written.
This describes the design for a system to ease building/downloading and installing HOL4 and working with HOL theories. The hoped for advantages are:
- Less building locally: for modules that are just being used, they can be downloaded pre-built and immediately used.
- Easy to get up-and-running with HOL4 as a new user. No complex or esoteric build process.
- Versioning and explicit dependence on versions. May avoid unnecessary rebuilding (although this requires care: only one version of a theory can exist in any well-formed theory graph).
- Explicit tracking of which versions of which modules are installed: no timestamp-based out-of-date dependencies at runtime.
- Better handling of examples: currently they either "make it" into the HOL GitHub examples directory or just live in some unknown repository in the ether. With modules we could have a centralised explorable repository at hol-theorem-prover.org. For example, CakeML releases could be actual module releases and their dependence on other HOL modules would be fine-grained and explicit (not just a corresponding commit of the entire HOL repo).
- Better support for working on multiple HOL-based developments at once, including at different git versions: no more messing around with HOLDIR, CAKEMLDIR, etc. environment variables to set the environment up just right. Everything can be much more explicit and live together without conflict.
Design principles:
- Follow existing practice whenever it is not too costly, to minimise friction in adopting the new system.
- Keep it simple - get it done fast, while making sure it is maintainable.
- Soundness is obviously paramount, but centralised trust is fine. There should always be an option to reverify everything locally if you want.
The point of a module is to create, when imported, an interactive environment
in which some SML values are available, including especially values of type
thm
. Thus, importing modules should be understood as in the context of a
dynamic SML environment called "the session".
Modules depend on each other explicitly. Applying updates to the session in the right order, respecting dependency, in order to import a module, is the job of the module management system.
A module consists (conceptually) of the following:
name
: unique, together with thetype
, in the single global namespace of HOL4 modules.type
: eitherTheory
orLibrary
.version
: a natural numberexports
: bindings (names for SML values) made available on import.state
: runtime state for the module in the session, including e.g. theorem-set data and other databases.- There is only one instance of a module's
state
in the session. - Code in modules is guaranteed to be run in any dependency-respecting order, and only once per module.
Theory
modules have nostate
.
- There is only one instance of a module's
effects
: (optionally excludable?) updates to the session (including primarily thestate
of other modules) that can be applied on import.Library
modules have noeffects
but their exports may include procedures that affect their (and/or other libraries')state
.Theory
modules would typically have effects on the ancestry graph, parser grammar, theory database, inductive type database, etc.
code
: HOL4-flavoured SML code (i.e. to be preprocessed by the quote parser) to run when a module is imported that:- (a) creates the module's
state
, - (b) applies the module's
effects
, - (c) makes the
exports
available to the session. - The
code
of a module includes explicit imports of other modules, thereby specifying the dependency graph on modules. - The
code
for aTheory
module is theScript
file from pre-modules HOL4. The module manager will also save theexports
andeffects
in a format (e.g. the.dat
files in pre-modules HOL4) for more quickly importing a theory without rerunning itscode
. - Some special modules might have additional files they need, e.g. maybe they use code in another language. We can figure out how to deal with this as the need arises.
- (a) creates the module's
This is meant to basically recover the current practice of fooTheory
and
fooLib
SML module usage in pre-modules HOL4, but with versioning and more
opportunities for using pre-built contents.
Should it be permissionless to upload modules? I think probably not - closer to status quo is to have some maintainers of the official module repository. This avoids having to protect against abuse of the namespace in the design: we can leave that to the maintainers' judgement. (Otherwise maybe names should also include an author and maybe also their signature, for extra disambiguation and protection against taking over a module by upgrading it.)
The respository should build modules to ensure that they build successfully and that their version dependencies are indeed sufficient. When a new module is submitted for upload, it may be rejected if a module that depends on it breaks -- it is up to the maintainers to resolve this (maybe the depending module has a too loose import version range, or maybe the updated module is breaking semver expectations with its change).
The module manager is a command line executable for which it is easy to
download a prebuilt binary (from hol-theorem-prover.org) for whatever platforms
we want to support. Let us call it hmm
for now.
All modules in the official repository are "available" for import. The job of
hmm
is to maintain a local cache of the ones the user wants for faster
importing. Modules (at specific versions) that are locally cached are
considered "installed".
There should be just one cache location. By default $XDG_CACHE_HOME/hol
, but
configurable via updating a reference in .hol-config.sml
(which should
perhaps also be allowed and encouraged to exist in $XDG_CONFIG_HOME/hol
).
hmm
should be able to list (and search by name for) all available modules in
the repository (indicating which are installed), and also list just the modules
(and versions) that are actually installed.
hmm
should be able to install a module given a name (and optionally a desired
version). It will install the highest available (or requested) version and a
compatible set of versions of all its dependencies.
hmm
should be able to uninstall (delete from cache) modules by name, to clean
up unneeded modules (according to dependencies of installed modules), and to
upgrade modules to newer versions that are still compatible with their
dependents.
hmm
should be able to "rebuild" modules for verification purposes:
- run a Theory module's code to create its imports (possibly in an interactive session) assuming its dependencies are fine
- run a Theory module and all its dependencies from scratch, to create its
imports in a session without any
DISK_THM
oracles The result here doesn't have to be an interactive session, it could just be some command line output indicating that reverification was successful.
hmm
should be able to launch hol
at the session of a particular module.
Usually this would be a module that is under development. Under-development
modules might need some special version treatment when in that state: probably
they would be "locally" installed but without a corresponding published entry
in the official repository.
Modules are defined by the files that contain their code
. The name
and
type
of a module is determined by its filename: fooScript.sml
is used for
a Theory
module whose name is foo
, and fooLib.sml
is used for a
Library
module whose name is foo
.
The version of the module is specified implicitly by the module management system: it is either a published version from the repository, or, if the code is being developed and has been changed, the next highest unpublished version.
Dependencies are specified using Import
syntax, as is being discussed in
this issue. Modules to
be imported are specified by name and type. The version is implicit: it is
determined as the version for the dependency in an optimal (cannot increase any
without breaking the build) set of versions of all dependencies for the target
module. There may be need to be support for explicitly requesting a version for
an Import
, especially for modules under development - but I think these
should not be published.
The exports
of Theory
modules are defined implicitly as currently (i.e. in
the way that pre-modules HOL4 automatically produces a Theory.sig
SML
signature for a script file).
The exports
of Library
modules should be defined explicitly either within
the fooLib.sml
file or (better matching common practice) in an accompanying
fooLib.sig
file, and should be defined as an SML signature named fooLib
.
The code of a Library
module should be define as an SML module conforming to
the signature with the same name (again matching current practice).
WIP
- interactive development, how to create the session for interactive development, how to change dependencies on the fly, how to reset the session state, etc.
- Uploading (publishing) modules to the repository of modules, making releases
- Practicalities of developing
Library
modules - Text editor integration
For each module, the session it creates when loaded should be as-if all its
dependencies were loaded (once each) in some dependency-respecting order,
modulo the use of the DISK_THM
oracle to shortcut reproving many theorems.
The resulting logical environment is sound as long as DISK_THM
is only used
in a safe way, i.e. skipped proofs would have worked if they were run instead
of skipped and would have produced the same logical content.
The module repository is curated so that there is a set of versions of each published (and accepted for inclusion) module that can be downloaded, and the set includes at least the latest version.
Our aim for published installed modules is that they can be loaded. To achieve this, we impose the following invariant on the module repository: for each module X with a version V available from the repository, there is a known set of versions D for all its dependencies such that X (at V) can be loaded successfully, and all versions in D are also available.
Ideally, we also strive to ensure that no later set of versions D' (i.e. for each module the version in D' is no less than the version in D) that can also be used to load X (at V). This is not enforced as an invariant, though, because we want to allow for progress to be made on some modules without waiting for other dependent modules to be updated that might be being maintained at a slower pace.
When downloading and installing X (at V) from the repository, we simply
download all the dependent modules at the version specified in D, as well as X
at V itself. The content of what is downloaded should be sufficient for hmm
to create a session for X at V.
To maintain the invariant when a new version of a module (or a new module altogether) is added to the repository, we require the uploader to provide a working set of versions for its dependencies and, as repository managers, check that the full build is possible at these versions before publishing.
The repository managers also attempt to update the set dependency versions for modules that are dependent on the updated module. If the build succeeds with a version updated, that updated version replaces the old one in the set. If not, the dependent package is marked as needing to be updated to support the newer version of its dependencies.
How the work of updating is done is up to the repository curators: it could mean assigning the task to a maintainer (or some other volunteer), renaming the module's old version so it can be depended on without expecting any more updates, dropping some module from the repository, or some other solution.
WIP
- How to get to here from pre-modules HOL4 and
Holmake
- Role of
Holmake
code in the module manager - How and when to use Poly/ML heaps to improve loading speed