TPM (Trusted Platform Module) is as useful for preventing real attackers as the TSA is at preventing real terrorists. The architecture is fundamentally flawed and most existing implementations are completely broken. I thought this argument was settled decades ago[1] when "trusted computing" was introduced mostly as a way to provide DRM and ownership capabilities to organizations. It has largely failed to impact the consumer market when it was introduced back in the early 2000s. However, recently there seems to be a movement by certain parties to reintroduce this failed product back to the market. Microsoft argues that in order to use Windows 11, you need TPM 2.0 compatible hardware because[2]:
The Trusted Platform Module(TPM) requirement enables Windows 11 to be a true Passwordless operating system, addressing phishing and other password-based attacks that are easier for attackers to execute when the TPM is not present.
Google is trying to force the ideas of "trusted computing" to web browsers[3] which although does not mention TPM explicitly, describes a design that would fit well into it.
Even Canonical is jumping on the bandwagon by bringing TPM based full-disk-encryption to Ubuntu[4] and claims it "eliminates the need for users to manually enter passphrases during boot" which is vacuously true (you also don't need to enter a passphrase if you disable it altogether but that doesn't mean it's any secure) and "eliminate the attacker’s ability to perform offline brute-force attacks against the passphrase" which is just plain false (you can brute force at a much slower speed).
According to the organization that designed it[5]:
The TPM is a microcontroller that stores keys, passwords and digital certificates. It typically is affixed to the motherboard of a PC. It potentially can be used in any computing device that requires these functions. The nature of this silicon ensures that the information stored there is made more secure from external software attack and physical theft. Security processes, such as digital signature and key exchange, are protected through the secure TCG subsystem. Access to data and secrets in a platform could be denied if the boot sequence is not as expected. Critical applications and capabilities such as secure email, secure web access and local protection of data are thereby made much more secure.
In a more technical sense, TPM is a device (which could be physically on the same die as your CPU) that stores encryption keys. This device can be made arbitrarily secure against physical extraction attacks. In order to hand the keys to the CPU, the CPU needs to "attest" to certain facts about its executing environment such as a hash of the bootloader, kernel, and so on. It (the CPU) does it by performing a series of "measurements" (cryptographic hashes) on each boot component and sends it to the TPM. The TPM has an append-only ledger which records this measurement and the final encryption key is only released (by the TPM to the CPU) if the measurement matches the provisioned value when the encryption key was created (or updated).
After the first revision of this article was published, it became clear that a significant portion of the audience do not have a security background and may not understand what a "threat model" is and why it is important when designing a security system.
Threat modeling is absolutely crucial because it helps you understand not only what the relevant threats are but what the irrelevant threats are as well. A common mistake when criticizing a secure design is to point out "flaws" which falls outside the model. We wish to avoid that here. For the designers, knowing the threat model allows you to identify weaknesses and decide if the weakness is relevant or not.
As an example, consider the threat model of a crypto wallet. Because the data the wallet is protecting is potentially millions of dollars, it needs to have very strong security guarantees. For example, if someone physically steals the wallet, they should not be able to physically extract the secrets. When the wallet is plugged into a computer, the computer should not be able to read the keys. The keys should not be extractable through side channel attacks. These are just some examples of things that would be requirements passed to the designers who will then do their best to meet it.
Now consider something like an MP3 player. If an attacker were to successfully infiltrate the MP3 player, the most valuable assets would probably be some songs and maybe listener preference data. This is significantly less valuable than the crypto wallet and so it would be unreasonable to invest the same amount of care and effort into securing the MP3 player as to securing the crypto wallet. That means if someone makes a claim that they're able to dump all the data off an MP3 player through a fault injection attack, we treat it as "interesting but not newsworthy" because it falls way outside the threat model. However, if someone claims that they're able to stream any song from the MP3 player without paying, it becomes much more interesting because that is likely something the device should protect against.
The problem when you do not have an explicit threat model defined, it is hard to criticize if something is "secure" or not. Most of the time for commercial products, architectural design documentation is not public, so security researchers have to "reverse engineer" (for lack of a better term) the threat model and attack it. This sometimes leads to conflict between a company and researchers when they disagree if something is a "security issue" or not. In our case, the TPM architecture documentation is public so we are able to precisely argue about the security of the design without ambiguity.
In the proceeding sections, we will suss out this threat model for TPM and then argue that:
- The model is too weak with respect to the kind of data it claims to protect.
- There are inconsistencies between what the threat model actually protects and what advocates of the product informally claims it protects.
The latest version of the TPM 2.0 specification[6] comes in four volumes. The first volume titled "architecture" is 276 pages long. If you Cmd+F and search for "threat model", "security model", or "assets", you will find no results. This is the fundamental issue with the TPM architecture. It was not written by security people but by a committee of mostly hardware engineers. The specification is so convoluted, an entire book[7] was written to explain it and used the term "security through incomprehensibility." If you are able to comprehend though, there are a few security claims that we can extract:
"A trusted platform will also offer Protected Locations (see 10.3) for the keys and data objects entrusted to it." (pg 25, 1.59) 10.3: "As noted, access to any data on a TPM requires use of a Protected Capability. Therefore, all information on a TPM is in a Shielded Location. The contents of a Shielded Location are not disclosed unless the disclosure is intended by the definition of the Protected Capability. A TPM is not allowed to export data from a Shielded Location other than by using a Protected Capability." (pg 31, 1.59) "Shielded Location: location on a TPM that contains data that is shielded from access by any entity other than the TPM and which may be operated on only by a Protected Capability" (pg 12, 1.59) "Protected Capability: operation performed by the TPM on data in a Shielded Location in response to a command sent to the TPM" (pg 10, 1.59) "When the sensitive portion of an object is not held in a Shielded Location on the TPM, it is encrypted. When encrypted, but not on the TPM, it is not protected from deletion, but it is protected from disclosure of its sensitive portions. Wherever it is stored, it is in a Protected Location." (pg 19, 1.59)
Translation: keys cannot be read out by the host unless the right sequence of commands are executed.
"The nominal method of establishing trust in a key is with a certificate indicating that the processes used for creating and protecting the key meets necessary security criteria. A certificate may be provided by shipping the TPM with an embedded key (that is, an Endorsement Key) along with a Certificate of Authenticity for the EK. The EK and its certificate may be used to associate credentials (certificates) with other TPM keys; this process is described in 9.5.3.3. When a certified key has attributes that let it sign TPM-created data, it may attest to the TPM-resident record of platform characteristics that affect the integrity (trustworthiness) of a platform." (pg 25, 1.59) 9.5.3.3: "Evidence of TPM residency may be provided using a previously generated certificate for another key on the same TPM. An EK or Platform Certificate may provide this evidence." (pg 28, 1.59)
Translation: there is a unique certificate private key inside TPM that allows for attesting that a key came from that TPM.
"The Core Root of Trust for Measurement (CRTM) is the starting point of measurement. This process makes the initial measurements of the platform that are Extended into PCR in the TPM. For measurements to be meaningful, the executing code needs to control the environment in which it is running, so that the values recorded in the TPM are representative of the initial trust state of the platform.
A power-on reset creates an environment in which the platform is in a known initial state, with the main CPU running code from some well-defined initial location. Since that code has exclusive control of the platform at that time, it may make measurements of the platform from firmware. From these initial measurements, a chain of trust may be established. Because this chain of trust is created once when the platform is reset, no change of the initial trust state is possible, so it is called a static RTM (S-RTM).
An alternative method of initializing the platform is available on some processor architectures. It lets the CPU act as the CRTM and apply protections to portions of memory it measures. This process lets a new chain of trust start without rebooting the platform. Because the RTM may be re-established dynamically, this method is called dynamic RTM (D-RTM). Both S-RTM and D-RTM may take a system in an unknown state and return it to a known state. The D-RTM has the advantage of not requiring the system to be rebooted.
An integrity measurement is a value that represents a possible change in the trust state of the platform. The measured object may be anything of meaning but is often
- a data value,
- the hash of code or data, or
- an indication of the signer of some code or data.
The RTM (usually, code running on the CPU) makes these measurements and records them in RTS using Extend. The Extend process (see 17.2) allows the TPM to accumulate an indefinite number of measurements in a relatively small amount of memory.
The digest of an arbitrary set of integrity measurements is statistically unique, and an evaluator might know the values representing particular sequences of measurements. To handle cases where PCR values are not well known, the RTM keeps a log of individual measurements. The PCR values may be used to determine the accuracy of the log, and log entries may be evaluated individually to determine if the change in system state indicated by the event is acceptable.
Implementers play a role in determining how event data is partitioned. TCG’s platform-specific specifications provide additional insight into specifying platform configuration and representation as well as anticipated consumers of measurement data.
Integrity reporting is the process of attesting to integrity measurements recorded in a PCR. The philosophy behind integrity measurement, logging, and reporting is that a platform may enter any state possible — including undesirable or insecure states — but is required to accurately report those states. An independent process may evaluate the integrity states and determine an appropriate response." (pg 29-30, 1.59)
This last point is quoted in its entirety because it is so important. This is the central security claim as to the usefulness of the TPM. First, we need to put aside D-RTM (we will revisit this at the end). For all practical purposes, S-RTM is the one that is widely used and is the one which Windows BitLocker depends on (as well as Ubuntu's recently announced FDE support). We will also focus on x86 implementations because although TPM is platform agnostic, it was developed by Intel and Microsoft (along with some now defunct PC companies) and those are the ones driving adoption.
The fundamental design flaw with TPM is that you must trust all software running on the system up until the measurement is sealed are free of vulnerabilities. We know that software has vulnerabilities so what does adding a TPM do to change that situation? The threat model is defended against a malicious agent being able to change past measurements but make no claims about future measurements! If the attacker exploits one vulnerability in the boot chain, they can get your keys and be able to decrypt your data forever (until the key is changed). This is no more secure than not having full disk encryption enabled at all because in either case, one exploit means the attacker has access to the data. In both cases (TPM enabled and no encryption), vulnerabilities can be patched by the vendor but it would be too late for the people who were already attacked. Let's give a more concrete example of a Windows boot in a typical Intel consumer platform. Note that this is simplified for clarity (see [8] for a more detailed process) and the measurement points are based on a hypothetical implementation based on the TPM design and not based on reverse engineering specific components to see what is done in practice.
- Intel Boot ROM starts execution and fetches the BIOS from the SPI flash. If Intel Boot Guard and Intel BIOS Guard are enabled (and that's a big if), the signature of the BIOS is checked against a public key whose hash is stored in fuses.
- UEFI boot starts, the TPM is initialized in the PEI phase, and certain configuration settings are measured.
- Each of dozens (up to hundreds) of UEFI drivers written by various OEMs with varying levels of competence and care are loaded.
- Windows
bootmgfw.efi
is loaded by the firmware. If Secure Boot is enabled, the signature is checked against a key in the DB (Database) signed by a Microsoft public key in the KEK (Key Exchange Database) which is itself signed by the PK (Platform Key) owned by the system vendor. The hash of bootmgr is measured. - Windows
winload.efi
is loaded by bootmgr. The signature is checked to be from Microsoft. The hash of winload is measured. - Various NT kernel components are loaded and measured.
- PCR7 is sealed from future appends and the TPM hands over the key to Windows kernel for BitLocker[9].
Note that at any point in this process, if the attacker is able to control code execution, there is no way for TPM to know that the measurement it was just handed wasn't a lie. Now let's assume you are an attacker trying to get the BitLocker keys, what can you do?
- You can exploit a vulnerability in the Windows kernel to extract the BitLocker key from memory without needing to care about TPM at all. Note in all likelihood, the vulnerability would require a user to be logged in if the attacker is remote. If the attacker is physical, they may be able to log in to a guest account and attempt to escalate their privileges.
- An attacker with physical access can attempt a DMA attack. In more recent platforms, Thunderbolt is protected[10] against this. However, if the attacker is able to open up the computer and access PCIe lanes, it is dependent on OEM implementation (meaning it is probably done incorrectly) to protect DMA access.
- Although Microsoft claims that BitLocker is secured against bootkits[9], this depends on the fact that the bootkit is measured. There are tonnes of BIOS and UEFI driver code (as well as Windows loader code) that parse untrusted input and a vulnerability in any of those components could be made into a bootkit. There are entire network stacks, USB stacks, and filesystems implemented in UEFI firmware.
These are just some example of attacks that fall outside of what TPM defends against but where an unsuspecting user might think they are protected due to the marketing of companies with a business interest in TPM. This is important because the vast majority of the ways that user data gets compromised is through malware on a live system or (to a lesser extent) bootkits. TPM provides no protection against that. Even in the realm of physical attacks (which affects a small minority of people), TPM provides a very limited guarantee in the security of user data.
You will often hear about UEFI Secure Boot in the context of TPM but it is important to understand that they are orthogonal technologies that serve different purposes.
Secure Boot, when implemented correctly (which includes Intel BIOS Guard and a fused platform key hash, something that is rarely done in consumer platforms), is supposed to guarantee the integrity of code running on a system and is code that was trusted by the platform integrator (or system administrator). If any check fails, the system will refuse to boot. Typically a wide range of software (Windows, Ubuntu, Debian) are permitted to run as well as different versions of those software.
Measured Boot can be used to attest to a specific configuration of the system at a point in time. This can include the current version of the software that is installed and any security policy that is being enforced. This can, in turn, allow user data to be entangled with the measurement (through, for example, BitLocker) in order to ensure (in theory, not in practice) that the data can used only by that same configuration. This means, you cannot boot into Ubuntu and extract your Windows data from a BitLocker encrypted volume (again, in theory).
A lot of marketing[11] try to conflate the two by making claims about TPM being used to protect system integrity. That is purposely misleading because Secure Boot already gives you that and does not require any new hardware.
Up until now, we have focused on full disk encryption and protection of data against malware as things that TPM is not good at protecting. This is because a lot of the marketing around TPM for consumer devices focuses on that. However, there are legitimate uses for TPM and they all revolve around attestation.
If we go back to the threat model, we will see that each TPM has a private key that allows it to attest that a certain key was created from that device. This means that the TPM is able to cryptographically prove that "yes, I was handed this exact sequence of measurements since the computer was first powered on." This can be useful in server and cloud environments where there is no physical attacker and you trust the data center. You would still be vulnerable to sophisticated bootkits and kernel exploits but as long as your systems are up to date, you should be well protected. You would then be able to perform attestation[12] on that server before using it to process sensitive data.
You can also use the TPM + PIN as a sort of Yubikey which is useful if you are sure that the keys do not stay resident in memory and have not already been compromised (so the PIN can be stolen). This is useful, for example, as an SSH agent [29] and less useful as a way for full-disk-encryption (which requires the key to be resident). You are also able to outsource private key signing operations directly to the TPM and not reveal the key to the CPU (on an uncompromised system).
Another use case is anti-cheats and DRM. These are controversial topics and certain vocal figures consider it to be "treacherous computing"[13]. However, the threat model of anti-cheats and DRM has always been a tradeoff between how long the protection lasts and how much work it takes to break it. A new release movie is much more valuable today than it is ten years from now, so a DRM that works for ten years is successful even if it is eventually broken. This is not true for user data and it is irresponsible to claim that TPM provides protection for user data in the same way.
The main issue with the TPM architecture is that the TPM exists as an external agent outside of the CPU while at the same time having to validate what is running inside the CPU. It it unaware of CPU ring levels and their differing privileges. It is unaware of DMA agents. It has a single communication interface with the CPU and it must treat that interface as a trusted up until the measurements are sealed and the keys are delivered. It is unrealistic to expect security with those assumptions but fortunately, the problem has already been addressed with CPU features such as Intel TXT (Trusted Execution Technology)[14], which implements a D-RTM that can present the TPM with measurements from a more authoritative source. TXT launch works as follows:
GETSEC[SENTER] broadcasts messages to the chipset and other physical or logical processors in the platform. In response, other logical processors perform basic cleanup, signal readiness to proceed, and wait for messages to join the environment created by the MLE. As this sequence requires synchronization, there is an initiating logical processor (ILP) and responding logical processor(s) (RLP(s)). The ILP must be the system bootstrap processor (BSP), which is the processor with IA32_APIC_BASE MSR.BSP = 1. RLPs are also often referred to as application processors (APs).
After all logical processors signal their readiness to join and are in the wait state, the initiating logical processor loads, authenticates, and executes the AC module. The AC module tests for various chipset and processor configurations and ensures the platform has an acceptable configuration. It then measures and launches the MLE.
The MLE initialization routine completes system configuration changes (including redirecting INITs, SMIs, interrupts, etc.); it wakes up the responding logical processors (RLPs) and brings them into the measured environment. At this point, all logical processors and the chipset are correctly configured.
At some later point, it is possible for the MLE to exit and then be launched again, without issuing a system reset.
In other words, a special CPU instruction is implemented which, when executed, will quiesce all other processors and devices in the subsystem, essentially putting the system in a known good state. The MLE (trusted process) can then perform measurements without fear of getting corrupted by other agents and the TPM can trust that the measurements it gets (which is stored in PCRs 17 and up) are from a higher privilege domain.
A software implementation that uses Intel TXT is tboot[15] which is supported by some variants of Linux. Of course, even this isn't perfect because S-RTM is used up until tboot is booted which means the same issue with the huge attack surface of UEFI firmware still exists. A proper implementation of Intel TXT has to happen before UEFI is loaded to properly reduce the attack surface. However the point is that the technology exists and if companies like Microsoft (and their OEM partners) want people to adopt TPM, the first thing they should do is to work on properly implementing their software stack to support the level of trust that the hardware is capable of.
The purpose of this article is to dispel the myth around TPM as some sort of general purpose key locker. Significant money and effort has been invested (wasted) in TPM hardware[16] and certification[17], governments[18][19] and organizations require TPM as shorthand for "we are serious about data security," and well-intentioned individuals work on open source projects and academic research to improve TPM security and find new use cases for it. However, the architecture of TPM has flavours of a LLM generated writing: when you look at it from a distance, it seems to hit all the right notes for a security device (strong keys, side channel protections, fault attack mitigation, key hierarchy diagrams, etc) but if you dig deep into the details, you will realize that TPM was not designed in order to protect against real threats that users face. Not in 2009 when it was first proposed, not in 2021 when Windows 11 started requiring it as an excuse to exclude people on older PCs, and not now when you are reading this.
There are a plethora of attacks on TPM in the past but we need to be clear that a system that is widely attacked does not necessarily mean it is fundamentally insecure but only that there are many implementation issues. Most of these implementation issues do not touch upon the points raised in this article (it doesn't matter if the gate to your garden is strong or weak if there is no fence around the garden). Nevertheless, many of the attacks demonstrate the lack of care and consideration in the TPM ecosystem.
Denis Andzakovic[20] showed in 2019 that a standard (MITM) man-in-the-middle attack on TPM hardware with a cheap FPGA is enough to retrieve BitLocker keys. This is because as it is currently implemented, Windows (pause for suspense) does not encrypt the traffic to the TPM. While TPM 2.0 does support encrypted sessions, it does not seem to be widely used. So while we can talk about how the CIA used DPA attacks[21] or how freezing the RAM is enough to steal the BitLocker keys[22], none of that matters at all because there is no protection against physical attacks as TPM sends keys over an insecure channel in plain. If you've heard any argument against TPM before, chances are this was the one you've heard. In fact as early as 2010[23] (one year after TPM was introduced) was this widely known. This was demonstrated again in 2021 by Dolos Group[24] and again by @marcan of AsahiLinux fame in his now-deleted Xitter. Why is encrypted session not used? The best guess is that in the current PC ecosystem where you have many TPM vendors, many motherboard vendors, and two CPU vendors, this has probably become an impossible engineering challenge.
By the way, this was Microsoft's response to Denis:
We have completed our investigation, and the behavior that you reported is something we’re aware of. We don’t expect any further action on this item from MSRC and will be closing out the case.
There is nothing new or novel, nor is this unique to Surface, it applies to all dTPMs, both 1.2 & 2.0. Some fTPM resist this attack (MitM dTPM), but you could just do MitM on the memory (or freeze it).
Basically: we know and we don't care. Someone should send an update to Microsoft's marketing team because they seemed to have missed the memo that TPM does not protect you against physical attacks. Also: fTPMs are a feature in recent CPUs where the TPM 2.0 device is implemented in the same flawed way (no direct access to the CPU architectural state in order to verify that the measurements are legitimate) but with the advantage that there are no SPI lines to sniff. I also suspect (with no evidence) that there are zero hardening against physical key extraction in these fTPM implementations that you get with dTPMs (an attack which, if you weren't following, misses the forest for the trees).
Microsoft also performed their own red team assessment of TPM back in 2006[26] where they pointed out DMA attacks, MITM attacks, and the lack of protection against future measurements. In each case their recommended resolution is to set a TPM PIN code, which by default, only support numeric digits ("enhanced" PINs are a modern development and require jumping through some hoops to enable).
In academia, a group found in 2018 that some firmware implementations do not properly restore the S-RTM on sleep awake[27], and another group found in 2011 that some firmware implementations do not implement SINIT properly in a D-RTM implementation with Intel TXT[28].