Skip to content

Instantly share code, notes, and snippets.

@THS-on
Last active March 6, 2022 10:22
Show Gist options
  • Save THS-on/4229ba2c8c83dc5d9726a6c62c932868 to your computer and use it in GitHub Desktop.
Save THS-on/4229ba2c8c83dc5d9726a6c62c932868 to your computer and use it in GitHub Desktop.
Non atomic qoutes for Keylime

Non atomic Quotes for attestation

Issue

A TPM contains multiple PCRs and can generate a signed quote over the concatenated hash of a selection of PCRs. The quote itself does not contain the values of the PCRs. If you want to have matching quote and PCR values most implementations (also Keylime) do the following trick:

  1. Read PCR values (8 at the time)
  2. Generate quote
  3. Read PCR values (8 at the time)
  4. Check if the PCR values from step 1. and 3. match, if not start with 1.

This works fine if the PCR values are essentially static which is the case for all the PCRs used during UEFI Secure Boot, but is not the case when IMA is enabled and extends PCR 10 quite frequently.

In cases where IMA is enabled this might cause an unintentional attestation failures because there is no atomic quote ever be generated. Also the quote signing is a computationally expensive operation that might block the TPM from performing other action.

Current Implementation in Keylime

For the verifier to attest the agent sends the following information:

  1. quote
  2. signature for the quote
  3. PCR values checked to be the same as the hash in the quote with the method described above.
    1. We currently use a binary data structure from tpm2-tools for that
  4. IMA log (optional)
  5. UEFI log (optional)
  6. NK transport key measured into PCR 16 (might not be always sent)

Then the verifier does the following steps:

  1. Check if hash in quote matches the concatenated hash of the PCR values
  2. Check the signature, quote and AK of the agent match
  3. (Validate data quote for NK against PCR 16 sent by the agent)
  4. (optional) Validate UEFI log
    1. Walk the UEFI log and get the computed PCR values
    2. Validate the UEFI log against a measured boot policy
    3. Check the computed PCR values against the PCR values sent by the agent
  5. (optional) IMA validation
    1. Validate an entry of the IMA log
    2. compute running hash for PCR 10
    3. check if running hash matches PCR 10 sent by the agent
      1. if yes stop
      2. if no goto i. or fail if there are no more entries
  6. Validate static PCRs (all PCRs that were not covered by UEFI or IMA log validation)
    1. Check PCR value against static allow list

    2. Check if all PCRs that should be validated are now actually validated

This model has several disadvantages:

  • It requires that the quote and sent PCR values match exactly
  • More complex validators (e.g. ImaBuf validator or measured boot policies) run before the integrity of the data fully validated against the quote. Which is not directly a security issue, but increases the attack surface of Keylime.

Moving to non atomic quotes

We do not actually require that the PCR values and quote is atomic implement any of the functionality above (if we assume that no other PCR than PCR 10 changes frequently).

The agent sends still the same data as above with the difference that the PCR values might not match the quote.

Now the following steps for verification are:

  1. Check if the signature, quote and AK match
  2. (skip if no UEFI log validation enabled) walk UEFI log and only save computed PCRs
  3. Build list with PCR 16 sent by the agent and the computed PCRs from the UEFI log with they are not present use the selected PCR values also sent by the agent
  4. (skip if no IMA log validation enabled) IMA entry structure validation. In this step we now try to iterate the log until we find a matching running hash for the quote. If there are no external failures this should always work because entries are first added to the IMA log then measured into the TPM. In this step we only validate the structure (hash of the entire struct) of the entry not its content. For the first iteration start with 4.ii, because the quote might already match before we even validated one entry (happens often by incremental attestation).
    1. Compute running hash with running hash for PCR 10 and the list from above and stop if it matches the quote or fail if there are no more entries and no match was found. We also save what the last entry of the IMA log was and only validate the content up to that point later.
    2. Parse and compute hash of IMA entry and update running hash for PCR 10
    3. goto i.
  5. If now the running hash matches the quote, we can assume that all the PCR values and data are valid.
  6. (optional) Validate UEFI log using a measured boot policy
    1. Parse UEFI log into JSON format
    2. run measured boot policy and produce an failure if the policy fails
  7. (optional) Validate content of IMA entries
    1. run complex validator on IMA entry and keep track of any failures
    2. goto i if not last entry.
    3. Output all the failures
  8. Validate static PCRs (all PCRs that were not covered by UEFI or IMA log validation)
    1. Check PCR value against static allow list
  9. Check if all PCRs that should be validated are now actually validated

Up to step 5. we return early if a failure occurs. After that we collect them and handle them according to their severity level. More information on that can be found here: https://github.com/keylime/enhancements/blob/master/46_revocation_severity_and_context.md

Design considerations

Verifier

The currently Keylime puts most of the PCR validation the validation into an abstract TPM which does the necessary calls to tpm2-tools for validation. This made sense for supporting TPM 1.2 and TPM 2.0 and sharing the code with the agent. We no longer support TPM 1.2 and longterm the Python agent will be deprecated and removed. Therefore the new validation code should have the following properties:

  1. Content validation of logs (UEFI, IMA) should be fully separate from testing that quote and data is valid
  2. This should allow us new data for validation easily
  3. If there is a new TPM or a similar (Pluton??) protocol we should be easily add support for that without changing our data validation
  4. quote validation is abstracted in a way that the current dependency on tpm2-tools can be swapped out with for example tpm2-pytss
  5. Easily unit testable. The current code is only covered through end-to-end testing.

With that in mind the proposed steps from above can be implemented without changing how users currently use Keylime.

Agent

The agent only one mayor change that should simplify the code in most cases. Instead of checking that the PCR values and the quote are atomic, the agent first reads the PCR values and then generates the quote and sends the data to the verifier.

API changes

With this change we want to reduce the dependency on tpm2-tools. We currently use for sending the PCR values a tpm2-tools specific data structure (tpm2_pcrs) and have a custom format encoding this with the quote and signature, this will get replaced by a JSON structure with the following structure:

{"pcrs" : 
	{
        "0": "HEX_ENCODED_VALUE_OF_PCR_0",
        "1": "HEX_ENCODED_VALUE_OF_PCR_1",
        ...
    },
 "quote": "BASE64_ENCODED_VALUE_OF_TPM_QUOTE",
 "sigature": "BASE64_ENCODED_VALUE_OF_TPM_QUOTE_SIGNATURE"
}

With only the PCRs present that were requested by the verifier.

Note that the old 2.0 API still provides all the necessary information only the data structures are changed to make implementations simpler, so the verifier can easily support both APIs.

Other ideas related to this change

Clock and firmware validation

The TPM quote also contains clock and firmware information besides the quote hash. Keylime currently does not use this data. The firmware string can be just another data point that can be validated like the logs. With the clock to checks can be implemented:

  1. Checking that there was no changes to the clock (the safe flag is set to true)
  2. If the system was rebooted between two quotes by checking if the clock advances at the right pace and checking reset and restart counters. Note that the two counters are obfuscated to make fingerprinting harder, so they can only be checked on equality.

The second point will allow Keylime easily detect scenarios where a device left the trusted state for a short period of time and then rebooted to get again into a trusted state.

Moving from tpm2-tools to tpm2-pytss

There are now Python bindings for the TPM with tpm2-pytss which implements parsing of TPM specific data structures and makes it possible to implement the quote signature fully in Python. Moving in the verifier to pytss would allow us to remove external calls to tpm2-tools. It might make sense to put more generic code for validation into pytss fist before using it in Keylime.

@kgold2
Copy link

kgold2 commented Feb 25, 2022

A few comments:

  1. PCR other that PCR 10 can be used as long as they append to the same log. I've been told that this is guaranteed in the Linux API.

  2. Do not use PCR 16. It is the debug PCR and other applications will use it. It is not needed for keylime attestation, and is a failure point.

  3. In the (moving) steps, I don't understand why 2. and 4. are optional, and I don't understand goto 1. 5. should be part of the 4. loop. If match, goto 5. else goto 4.

  4. Remember the corner case where the quote matches before checking any log entries. The loop starting point should be 4.ii.

  5. Agent: There is a strange statement about reading PCRs and generating the quote. There is no need to read the PCRs (except for perhaps debugging during development). They are surely not needed to generate the quote.

  6. API: 1 I don't think it is necessary to send PCRs at all. I do think it is necessary to send the event logs. See https://github.com/kgoldman/acs#attestation-response---client-to-server for an example

  7. Minor suggestion: If you're going to change from compressed data, I recommend HEX encoded for everything over base64. It has no drawback and makes debugging easier.

  8. Clock and firmware validation: Beware that this data depends on the key hierarchy. Unless the quote signing key is in the endorsement hierarchy, the data is obfuscated for privacy. The main use case is for detecting a reboot between attestations.

  9. Moving from tpm2-tools to tpm2-pytss: I don't understand the rationale for using either. The verifier does some hashing and memcmp's, but it never needs the TPM at all. Perhaps you want to use the TSS for unmarshaling?

@THS-on
Copy link
Author

THS-on commented Feb 26, 2022

@kgold2 thanks for the comments.

  1. PCR other that PCR 10 can be used as long as they append to the same log. I've been told that this is guaranteed in the Linux API.

I think it can be only one PCR, right? So someone could build a kernel with PCR 11 for IMA, but not PCR 10 and 11. If we are already changing the API of Keylime we could add a flag for what PCR is the IMA PCR.

  1. Do not use PCR 16. It is the debug PCR and other applications will use it. It is not needed for keylime attestation, and is a failure point.

I completely agree and that is why in the other proposal I eliminate the need for binding data to a quote using resettable PCRs. Moving to PCR 23 is probably as bad as using PCR 16. Just in general is there a good way to bind a checksum of arbitrary data to a quote?

The idea of using PCR 16 comes probably from here: https://opensecuritytraining.info/IntroToTrustedComputing_files/Day2-1-auth-and-att.pdf

  1. In the (moving) steps, I don't understand why 2. and 4. are optional [..]

They are optional if no attestation of the IMA or UEFI log is specified, so if you are only checking against static PCR values. If verification for IMA and/or UEFI event log is enabled those steps are not optional. I will change the wording to make this more clear.

I don't understand goto 1. 5. should be part of the 4. loop. If match, goto 5. else goto 4.

GitHub renders numbered bullet points differently than my local editor. Should be fixed.

  1. Remember the corner case where the quote matches before checking any log entries. The loop starting point should be 4.ii.

Yes, is fixed.

  1. Agent: There is a strange statement about reading PCRs and generating the quote. There is no need to read the PCRs (except for perhaps debugging during development). They are surely not needed to generate the quote.

You are right that the PCR values are not required to generate the quote, but we still want to read and send them to the verifier to still allow checking them against predefined values. I know that this is brittle, but we currently support that in Keylime, so we cannot break this functionality. In cases where tboot is enabled you will loose the UEFI eventlog, so you need the PCR values.

  1. API: 1 I don't think it is necessary to send PCRs at all. I do think it is necessary to send the event logs. See https://github.com/kgoldman/acs#attestation-response---client-to-server for an example

See answer to 5.

  1. Minor suggestion: If you're going to change from compressed data, I recommend HEX encoded for everything over base64. It has no drawback and makes debugging easier.

It adds more traffic (roughly 65% more), but I don't if that an issue in practise. I would like to enable transport compression, but it has shown that it is a great attack vector for DOS attacks.

  1. Clock and firmware validation: Beware that this data depends on the key hierarchy. Unless the quote signing key is in the endorsement hierarchy, the data is obfuscated for privacy. The main use case is for detecting a reboot between attestations.

Yes, is all data obfuscated? I know that reset_count and restart_count are but is the actual time also obfuscated?

  1. Moving from tpm2-tools to tpm2-pytss: I don't understand the rationale for using either. The verifier does some hashing and memcmp's, but it never needs the TPM at all. Perhaps you want to use the TSS for unmarshaling?

tpm2-tools provides tpm2_checkquote and tpm2_makecredential which we use for quote validation and make credential. As you already pointed out we would use tpm2-pytss for unmarshalling TPM data structures and also for having a make credential implementation written in Python.

@kgold2
Copy link

kgold2 commented Mar 4, 2022

  1. I've already seen proposals for using PCR 11 along with PCR 10. I suggest that the basic design should accommodate more than one.
  2. HEX may add more network traffic, but it's minimal after the first quote. The main performance hits are the quote at the agent side and the signature verification at the verifier. The network traffic has no effect on performance in my benchmarks.
  3. firmware version is obfuscated. Time is not - I thought it was.
  4. The tools do many context save and loads. I posted the analysis previously, and it will affect performance. For makecredential, sure, it's not on a critical path. Checkquote is. Does it use the TPM to check the signature?

@THS-on
Copy link
Author

THS-on commented Mar 6, 2022

  1. I've already seen proposals for using PCR 11 along with PCR 10. I suggest that the basic design should accommodate more than one.

Good to know! The issue then is walking the log until we match with two frequently updated PCRs is a lot harder than just one. (It might be O(n*m) for the n entries in the one PCR and m in the other complexity wise).

  1. HEX may add more network traffic, but it's minimal after the first quote. The main performance hits are the quote at the agent side and the signature verification at the verifier. The network traffic has no effect on performance in my benchmarks.

Yeah you are right, compared to the IMA log the data is quite small. I'm fine with moving to HEX to make debugging easier.

  1. The tools do many context save and loads. I posted the analysis previously, and it will affect performance. For makecredential, sure, it's not on a critical path. Checkquote is. Does it use the TPM to check the signature?

We use a pure software implementation of make credential that does not use a TPM, the same goes with the checkquote. On the server side tpm2-tools for those helper tools and not to interact with a TPM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment