This is a rough sketch I've put together in my mind of how an 'ACME daemon' might end up looking.
acmetool is designed for batch operation which works well for small use cases but large scale deployments will work better with a daemon. This will probably expose a service via an HTTP API, so that arbitrary parts of a service provider's stack can request certificates.
This API will need to be asynchronous as it may take arbitrarily long for 'acmed' to obtain certificates. For example, if a service provider's customer changes their nameservers to those of a provider, this change may take time to propagate. The service provider will need to keep checking until the changes propagate and they can complete challenges for the issuance of certificates.
So this suggests that there should be an API for requesting certificates for certain hostnames, and then the daemon keeps doing challenge self-tests periodically until it determines it can obtain certificates, and then does so.
Likewise there will also need to be an API for the retrieval of certificates. There is some prior art in the field of programmatically retrieving and using certificates: an open source TLS terminator called Bud can lookup certificates for SANs using HTTP requests. Implementing this interface would be useful and desirable.
There's a question of who should generate, own and control private keys. On the one hand, it seems more secure for private keys to be generated where they will ultimately be used. In this case it would be necessary for this server to transmit a CSR to acmed for it to use to request certificates. But this is unworkable where there is more than one server needing to make use of the certificate. You could have some daemon to manage and hand out these keys, but that sounds... rather like acmed, so you're just duplicating the role.
In other words, it makes the most sense for acmed to generate and manage these private keys, and to hand them out to authorized clients. This is indeed required by the protocol Bud expects, as described above.
The whole point of acmed would be to handle largescale deployments that acmetool is unsuited for. So there's no point if we don't start with a largescale instantiation of the problem. So let's say:
You have 10 load balancers, each which need to be able to obtain certificates and private keys. They all serve the same hostnames. Five of the load balancers are in one data centre and five are in another. Two instances of acmed must be used for redundancy, and acmed must support this.
Under this model, whatever database acmed uses will need to be suitable for use by multiple accessors, and so networked. PostgreSQL will probably be preferred, but maybe other backends can be supported too. Since PostgreSQL supports a 'LISTEN/NOTIFY' protocol, it also may be possible to support configuring acmed by PostgreSQL database changes as an alternative to the HTTP interface.
acmed will need to automatically retry challenges periodically. Status information for a given hostname should probably be exposed in machine-readable form via an HTTP endpoint. The ability to retry should probably be exposed via an HTTP endpoint.
The HTTP proxy/listener, webroot and hook and TLS-SNI and DNS hook challenge completion methods which are supported by acmetool should remain applicable in these cases, and therefore these parts of acmetool can probably be reused with minor refactoring.
The process execution model of the ACME hooks system might in extreme cases become a bottleneck. One possible solution to this is to allow UNIX domain sockets to be placed in the hooks directory, in which case they are detected and a certain protocol made over them. This protocol could be HTTP. But more practically, it would probably make sense to simply allow HTTP-based hooks to be configured.
PUT /names/{hostname}
Indicate a desire for a hostname.
DELETE /names/{hostname}
Removes a desire for a hostname. The certificates are not deleted, but are no longer accessible
via the API. If the hostname is requested again, the existing certificates are reused if they are
not expired. Maybe have an option for revoking, in which case the certificates really are deleted.
GET /names/{hostname}/cert
PEM-encoded certificate, if available; else 404 or maybe a default certificate or in-progress response.
GET /names/{hostname}/privkey
PEM-encoded private key, if available.
GET /names/{hostname}/chain
A series of PEM-encoded certificates in the chain, not including the end certificate.
GET /names/{hostname}/bud
A BUD-compatible JSON response.
GET /names/{hostname}/status
A JSON or maybe also HTML (via negotiation) response indicating recent failures in completing challenges
or acquiring certificates.
Hostname lumping: How to control, and allow the expression of, the lumping of hostnames into different certificates? In acmetool you get to specify this arbitrarily via targetfiles. Some possibilities:
-
Instead of the above interface, expose the idea of 'targets' just as acmetool does, possibly with a search or conditional-creation API to allow it to be determined whether there already exists a certificate satisfying the target.
-
Allow lumping to be done arbitrarily and controlled by static configuration. For example, lump large numbers of unrelated hostnames into certificates. This may require delaying requests for a hostname until more requests pile up.
-
This is a service-based design, as opposed to a library-based design. Either way, implementing the service will probably result in components which are easy to use as a library.
I have come to the same conclusions for certsd on many points. Would you be interested in combining efforts here? Note that my certs repo is two commands: certs and certsd. Certs is synchronous, whereas certsd is a long-running background process, basically the same thing as your proposed acmed. I hadn't pushed most of my notes about what I was planning for certsd yet but this more or less covers it. The advantage to combining with certs is that you could reuse the same underlying code base, with regards to obtaining and renewing certificates, etc.