Advogato: Blog for mjg59

Tor, TPMs and service integrity attestation

One of the most powerful (and most scary) features of TPM-based measured boot is the ability for remote systems to request that clients attest to their boot state, allowing the remote system to determine whether the client has booted in the correct state. This involves each component in the boot process writing a hash of the next component into the TPM and logging it. When attestation is requested, the remote site gives the client a nonce and asks for an attestation, the client OS passes the nonce to the TPM and asks it to provide a signed copy of the hashes and the nonce and sends them (and the log) to the remote site. The remoteW site then replays the log to ensure it matches the signed hash values, and can examine the log to determine whether the system is trustworthy (whatever trustworthy means in this context).

When this was first proposed people were (justifiably!) scared that remote services would start refusing to work for users who weren't running (for instance) an approved version of Windows with a verifiable DRM stack. Various practical matters made this impossible. The first was that, until fairly recently, there was no way to demonstrate that the key used to sign the hashes actually came from a TPM[1], so anyone could simply generate a set of valid hashes, sign them with a random key and provide that. The second is that even if you have a signature from a TPM, you have no way of proving that it's from the TPM that the client booted with (you can MITM the request and either pass it to a client that did boot the appropriate OS or to an external TPM that you've plugged into your system after boot and then programmed appropriately). The third is that, well, systems and configurations vary so much that outside very controlled circumstances it's impossible to know what a "legitimate" set of hashes even is.

As a result, so far remote attestation has tended to be restricted to internal deployments. Some enterprises use it as part of their VPN login process, and we've been working on it at CoreOS to enable Kubernetes clusters to verify that workers are in a trustworthy state before running jobs on them. While useful, this isn't terribly exciting for most people. Can we do better?

Remote attestation has generally been thought of in terms of remote systems requiring that clients attest. But there's nothing that requires things to be done in that direction. There's nothing stopping clients from being able to request that a server attest to its state, allowing clients to make informed decisions about whether they should provide confidential data. But the problems that apply to clients apply equally well to servers. Let's work through them in reverse order.

We have no idea what expected "good" values are

Yes, and this is a problem. CoreOS ships with an expected set of good values, and we had general agreement at the Linux Plumbers Conference that other distributions would start looking at what it would take to do the same. But how do we know that those values are themselves trustworthy? In an ideal world this would involve reproducible builds, allowing anybody to grab the source code for the OS, build it locally and verify that they have the same hashes.

Ok. So we're able to verify that the booted OS was good. But how about the services? The rkt container runtime supports measuring each container into the TPM, which means we can verify which container images were started. If container images are also built in such a way that they're reproducible, users can grab the source code, rebuild the container locally and again verify that it has the same hashes. Users can then be sure that the remote site is running the code they're looking at.

Or can they? Not really - a general purpose OS has all kinds of ways to inject code into containers, so an admin could simply replace the binaries inside the container after it's been measured, or ptrace() the server, or modify rkt so it generates correct measurements regardless of the image or, well, there's lots they could do. So a general purpose OS is probably a bad idea here. Instead, let's imagine an immutable OS that does nothing other than bring up networking and then reads a config file that tells it which container images to download and run. This reduces the amount of code that needs to support reproducible builds, making it easier for a client to verify that the source corresponds to the code the remote system is actually running.

Is this sufficient? Eh sadly no. Even if we know the valid values for the entire OS and every container, we don't know the legitimate values for the system firmware. Any modified firmware could tamper with the rest of the trust chain, making it possible for you to get valid OS values even if the OS has been subverted. This isn't a solved problem yet, and really requires hardware vendor support. Let's handwave this for now, or assert that we'll have some sidechannel for distributing valid firmware values.

Avoiding TPM MITMing

This one's more interesting. If I ask the server to attest to its state, it can simply pass that through to a TPM running on another system that's running a trusted stack and happily serve me content from a compromised stack. Suboptimal. We need some way to tie the TPM identity and the service identity to each other.

Thankfully, we have one. Tor supports running services in the .onion TLD. The key used to identify the service to the Tor network is also used to create the "hostname" of the system. I wrote a pretty hacky implementation that generates that key on the TPM, tying the service identity to the TPM. You can ask the TPM to prove that it generated a key, and that allows you to tie both the key used to run the Tor service and the key used to sign the attestation hashes to the same TPM. You now know that the attestation values came from the same system that's running the service, and that means you know the TPM hasn't been MITMed.

How do you know it's a TPM at all?

This is much easier. See [1].

There's still various problems around this, including the fact that we don't have this immutable minimal container OS, that we don't have the infrastructure to ensure that container builds are reproducible, that we don't have any known good firmware values and that we don't have a mechanism for allowing a user to perform any of this validation. But these are all solvable, and it seems like an interesting project.

"Interesting" isn't necessarily the right metric, though. "Useful" is. And I think this is very useful. If I'm about to upload documents to a SecureDrop instance, it seems pretty important that I be able to verify that it is a SecureDrop instance rather than something pretending to be one. This gives us a mechanism.

The next few years seem likely to raise interest in ensuring that people have secure mechanisms to communicate. I'm not emotionally invested in this one, but if people have better ideas about how to solve this problem then this seems like a good time to talk about them.

[1] More modern TPMs have a certificate that chains from the TPM's root key back to the TPM manufacturer, so as long as you trust the TPM manufacturer to have kept control of that you can prove that the signature came from a real TPM

comments

Syndicated 2016-11-10 20:48:30 from Matthew Garrett

10 Nov 2016 mjg59 » (Master)

We have no idea what expected "good" values are

Avoiding TPM MITMing

How do you know it's a TPM at all?