Securely delegating trust with digital signatures and secret storage systems

A "secret" is any piece of information that we want to tightly control access to, for example passwords, encryption keys, and safe codes. Secrets can have many uses, but at the end of the day they're just very large numbers, often on the order of 2^256 which is too many possibilities to be able to guess from. It could be something like:

115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,

640,564,039,457,584,007,913,129,639,936

Knowledge of a secret can grant exclusive privileges like being able to decrypt some information. However, another possible ability is being able to prove that you are who you say you are. For example, when you enter your password on a website you are "proving" that you are the account holder. Asymmetric cryptography allows us to take this idea of trusted proofs further. We can create identities and delegate trust to them across our infrastructure, but just like passwords, it all relies on keeping the secrets that it depends on safe from attackers.

Previously we've talked about how we manage and control access to secrets, and how we protect our most sensitive sensitive secrets from the most determined attackers. To summarise, we use HashiCorp Vault to store secrets such that they're encrypted, immutable, auditable, and we have strong guarantees that only the right users can access them. More risky secrets are stored completely offline and protected with rigorous key ceremonies that provide strong auditability guarantees and require multiple trusted people to participate in order to perform any high-risk actions. In this post I'll go over how these different secret storage layers work together with asymmetric cryptography to keep our users safe!

How digital signatures can delegate trust

In order to be able to delegate trust we need to be able to both make trustable statements and verify that a given statement is valid. This is possible using digital signatures which we use to make sure that messages cannot be forged or changed without us knowing. Digital signatures come from public key cryptography, which is a branch of cryptography that uses pairs of keys (asymmetric) instead of a single key (symmetric).

Here's an example. Alice generates a key-pair which is composed of her private key (only she knows it) and her public key (she shares it with others). Given a signed message from Alice, whose public key we know, we can always check that it was actually Alice who wrote the message, and we’re always able to detect if the message was changed after she wrote it. This works since only Alice’s private key can sign messages that verify under her public key.

Internet security today relies heavily on Transport Layer Security (TLS), which uses digital signatures to verify the authenticity of websites and servers that clients connect with. Servers present a certificate chain to the client, which is usually made up of a root certificate, an intermediate signed by the root, and a leaf certificate signed by the intermediate.

A diagram depicting the chain of trust through from certificate authorities to intermediate certificates and leaf certificates.

The leaf certificate contains some claims like “I am in control of the domain example.com until January 1st 1984”, and the client verifies that these claims can be trusted by checking that the leaf has a valid signature from the intermediate, the intermediate has a valid signature from the root, and the root is a certificate authority that is known and trusted by the client.

If all this checks out, the handshake proceeds and the peers establish a secure, encrypted connection. As shown in the diagram, the client’s trust of the root authority was delegated to the intermediate which in turn delegated it to the leaf, and the client is able to verify this “chain” back to the root.

If there was no verification of this “chain of trust”, an attacker would be able to present the client with their own certificate and the client would have no way to tell if it was legitimate. This is called a machine-in-the-middle attack and it allows an attacker to see all information exchanged over the network and even insert their own!

An important side note is that in the above setup, the client can be sure of the server’s identity, but the server doesn’t know anything about the authenticity of the client. This is a good tradeoff between complexity and security for use in public websites. However, in an internal system where we have lots of control over every computer, we can implement mutual TLS so that trust is established both ways. This allows for much stronger security guarantees... but deserves a blog post of its own. 👀

The key takeaway here is that trust underpins the entire security model, and digital signatures allow us to prove, delegate, and verify trust.

Secret-storage systems

There's a few things that help us decide where and how to store a secret. In particular:

who needs access
how frequently
how damaging it would be if the secret was accessed by anyone other than its owner

On one end of the scale, we have secrets that need to be frequently accessed, with minimal friction and latency by automated services. The attack surface here is wider because of the loose access requirements, but the risk is small since these secrets expire very quickly and have a relatively small impact from being leaked (what we call 'blast radius'). For example, a compromised private key for a TLS server’s leaf certificate can only be used to spoof connections to that website, and only until the certificate expires which could be a matter of days. We store these types of secrets in locked down HashiCorp Vault instances where they can only be read by authenticated and authorised services, or exclusively in-memory inside the services that own them.

On the other end of the scale are secrets that are very rarely accessed and so even our extremely time-expensive and high-friction access procedure is tolerable in exchange for a drastically reduced attack surface and high levels of assurance. These types of secrets are stored completely offline in secure locations and accessing them requires multiple trusted people to cooperate in an air-gapped “key ceremony”.

Side note: Requiring the cooperation of multiple trusted parties is achieved using Shamir’s Secret Sharing, where a secret is split into n different pieces such that any k ≤ n pieces can together reconstruct the secret but any fewer than k pieces cannot. This provides both strong access controls and redundancy in case some shares are lost.

Somewhere in the middle are secrets that have a fairly large blast radius but also need to be accessed frequently. These are also stored in locked down Vault instances but the difference is that for their entire existence they never leave Vault. Examples of secrets in this category are private keys for intermediate certificates. These are extremely sensitive, but they could be needed thousands of times per day in order to renew leaf certificates. There is still very little risk in this model since intermediates have much shorter lifespans than roots, and accessing the contents of Vault requires the cooperation of multiple trusted people (and even then it’s impossible to do so without raising many alarm bells 🚨).

This tradeoff between convenience and assurance is not a coincidence. For example, password managers and two-factor authentication make it much more difficult for an attacker to take over an account but they add some friction to the login process. Picking the right balance is crucial.

A graph showing convenience on the x axis and security assurance on the y axis. A series of actions are plotted including a post-it note with a password on it at low security assurance, but high convenience, and an offline process that requires multiple parties present with a high level of assurance, but low level of convenience.

We’re also looking forward to a few nice security features becoming standardised and widely available so that we can apply defence in depth and make our secrets management even more secure!

Linux kernel 5.14 introduces memfd_secret which allows a process to create some memory space that is totally isolated from even the kernel itself, and ensures that the contents of this memory are wiped after use. This would make it extremely difficult, even for an attacker who has full control over a computer, to access sensitive information.

Another exciting feature is the recently accepted proposal to allow secure key erasure in Go. This gives higher assurances to Go programs that memory regions containing sensitive information will not be copied by the garbage collector and will be wiped before the memory is released to the operating system. A few years ago I made memguard, a Go library that uses unsafe code to solve this problem without help from the Go runtime, and I’m happy that these workarounds will soon be less needed.

Conclusion

Authentication is a critical aspect of secure inter-service communication, and public key cryptography provides the tools needed to establish trust between components in our infrastructure. Cryptographic tools in general usually rely on the secrecy of key material, and we use a combination of different secret storage systems to implement a balance between security assurances and usability.

We’ll soon post more blog posts covering the design of our public key infrastructure in more detail, how services are able to issue certificates, how we make use of mutual TLS, and our approach to designing secure infrastructure here at Monzo.

If you’re interested in this kind of stuff we’d love to have you on our team!