Skip to content

Instantly share code, notes, and snippets.

@sleevi
Last active July 24, 2024 14:50
Show Gist options
  • Save sleevi/5efe9ef98961ecfb4da8 to your computer and use it in GitHub Desktop.
Save sleevi/5efe9ef98961ecfb4da8 to your computer and use it in GitHub Desktop.

On Twitter the other day, I was lamenting the state of OCSP stapling support on Linux servers, and got asked by several people to write-up what I think the requirements are for OCSP stapling support.

  1. Support for keeping a long-lived (disk) cache of OCSP responses.

    This should be fairly simple. Any restarting of the service shouldn't blow away previous responses that were obtained. This doesn't need to be disk, just stable - and disk is an easy stable storage for most server operators.

  2. Validate the server responses to make sure it is something the client will accept.

    There's a number of ways to botch this on the server, and sadly, a number of ways in which CAs can botch their response generators. The most immediate and obvious issues are situations where you have a 'revoked' response, or when you receive an OCSP 'tryLater' or 'internalError' response. However, there's also more subtle issues, like making sure the OCSP Response as actually well-formed (sometimes uploads to CDNs are botched), is time valid for the current time (sometimes the CDNs server stale files), is for the certificate requested (yes, sadly, really), and any sort of PKI-related errors (for example, the delegated OCSP signer's certificate being expired).

  3. Refreshes the response, in the background, with sufficient time before expiration.

    A rule of thumb would be to fetch at notBefore + (notAfter - notBefore) / 2, which is saying "start fetching halfway through the validity period". You want to be able to handle situations like the OCSP responder giving you junk, but also sufficient time to raise an alert if something has gone really wrong.

    What you do NOT want to do is start OCSP fetching the first time you need it, or waiting until the response is fully expired - that creates really terrible experiences all around, and makes your CA an even bigger point of failure.

  4. That said, even with background refreshing, such a system should observe the Lightweight OCSP Profile of RFC 5019.

    This more or less boils down to "Use GET requests whenever possible, and observe HTTP cache semantics." Given how complicated the cache semantics can be to get right in a client, this can be surprisingly hard to implement correctly.

  5. As with any system doing background requests on a remote server, don't be a jerk and hammer the server when things are bad.

    The Internet is a strange and wonderful place, and sometimes servers and networks have issues. When a server supporting OCSP stapling has trouble getting a request, hopefully it does something smarter than just retry in a busy loop, hammering the OCSP server into further oblivion. This may seem implied by the previous two remarks, but it's worth spelling out.

  6. Distributed or proxiable fetching

    From talking with server operators, a variety of situations are brought up as challenges for OCSP stapling. One common bucket is the problem of front-end and back-end splits - there may be thousands of FE servers, all with the same certificate, all needing to staple an OCSP response. You don't want to have all of them hammering the OCSP server - ideally, you'd have one request, in the backend, and updating them all.

    A variation of this problem is FEs that aren't actually allowed to initiate outbound connections. Sometimes it's required that the FE talk to a proxy server, sometimes it's just outright blocked - so a system should be robust in handling that distribution.

    This may not be a problem for the OCSP daemon to solve - it could be that the matter is just treated as a general configuration management/distribution problem - but at least it should be clear to those deploying the config what the tradeoffs are. For example, is it possible for the config distribution system to mangle responses? Should FEs still check the validity of incoming responses?

  7. The ability to serve old responses while fetching new responses.

    That is, it shouldn't be mutually exclusive - it's not that there is the 'ONE TRUE RESPONSE' - some flexibility for multiple responses is needed.

  8. Some idea of what to do when "things go bad".

    What happens when it's been 7 days, no new OCSP response can be obtained, and the current response is about to expire? Do you:

    1. Stop the (web/email/ftp/xmpp) service?
    2. Stop serving stapled OCSP responses?

    Especially in a world where Must-Staple becomes more prevalent, what should the action be taken when things go awful? If it's a Must-Staple cert, it might be more beneficial to fully stop the service (thus causing monitoring to really flip out) rather than serve bad responses or no response, both of which may result in even worse user experiences.

  9. Configurable OCSP responder per-certificate-being-checked.

    The CA/Browser Forum's Baseline Requirements allows CAs to omit the authorityInfoAccess extension for situations where the subscriber has agreed to staple. This agreement can be done via contractual means or technical means, which is to say that it's not predicated on the Must-Staple extension in the certificate. The reason for this omission is to allow for smaller certificates, which offsets (a very small amount) of the size increase of the OCSP response.

    For these certificates, the server operator will need to configure what the OCSP responder URL is for that certificate.

  10. Staple by default.

    If you can get all the above worked out, with sane behaviours, there is very little reason that OCSP stapling shouldn't be on by default. Make it happen!

If this seems like an unfairly long list, the reality is that virtually all of this is supported by Microsoft IIS services today. The Microsoft documentation is a bit spread out, but this is good for starters, and this is good for further reading.

Given this long list of things, which do seem somewhat 'basic', it seems a shame to require every TLS server to reimplement this. This seems ideal to have as a common, stand-alone daemon/service, which can then interface with a variety of TLS servers (IMAP, SMTP, HTTP, FTP, XMPP, etc).

Perhaps the most basic interface for this is simply dropping the OCSP response to a well-known path pre-agreed with the server. The server can monitor for changes to this file. When changes are noticed, it can start serving the new response. While some logic (such as shutting down the service) may be more complicated, that at least starts with some basic functionality.

@richmoore
Copy link

In terms of implementing such a tool with openssl things aren't helped by the weakness of the documentation in this area. Particularly the documentation on how to actually verify OCSP responses correctly is poor.

@j47996
Copy link

j47996 commented Dec 3, 2015

Rather than a file-based service I'd want (with my SMTP hat on) a socket-connected service. Fire a cert (maybe cert-chain) at it, get back
STATUS_GOOD, STATUS_BAD, NO_INFO_YET. Oh, and if it talks revocation lists as well as OCSP on the far side, so much the better
(but I still want the simple interface on my side).

Also to watch: RFC 6961. I know of no TLS library implementations yet.

@drwilco
Copy link

drwilco commented Dec 3, 2015

We've found HAProxy's TLS support to be wonderful for OCSP stapling.

@richsalz
Copy link

richsalz commented Jun 8, 2016

This is excellent. Except that for #8 if I run the server and I know the cert is good, then I'm not gonna bring it down because the CA can't keep up. :)

@KellerFuchs
Copy link

@richsalz Probably not, but you likely want to send very loud warnings to the admin if the staple cannot be renewed (optionally, since it's likely that only Must-Staple users care).

@philpennock
Copy link

I stumbled across this while implementing a tool I wanted, to do much what is described here, for use with Exim. Exim will only serve staples if something else fetches/maintains them and I want to move away from Cron/shell.

I've gotten the first alpha, v0.1, functioning. It's very early, far too early to rely upon this in production yet, but might be of interest?
https://github.com/PennockTech/ocsprenewer and go get -v go.pennock.tech/ocsprenewer/cmd/ocsprenewer

Standalone tool to renew staples on disk. Initial testing is for use with Exim and Let's Encrypt. Implements the timer-based approach, because I'd already decided DHCP's timer-based model was the right thing to use. :)

Bug-reports appreciated (PRs more so!)

@mholt
Copy link

mholt commented Jan 30, 2018

Just wanted to give an update on Caddy's support for these suggestions. Update: Caddy now fully implements ALL suggestions (including 8 in the sense that we have given it some thought but are waiting for Must-Staple to mature).

Caddy is, and has been for the last couple years, the only web server in its class to implement robust OCSP stapling. Its sites have successfully weathered OCSP responder outages that brought other sites down (I remember a notable instance of gnu.org going down last year for a while).

Hopefully someday certificate lifetimes will be as short as OCSP response lifetimes, with the possibility to then drop OCSP entirely and simplify our infrastructure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment