At Monzo we run all kinds of workloads, and some of them are really sensitive: they deal with very secret passwords, or authenticate our employees. So for these, we really want to make sure they are running the code we intend to run on them.

A good example of one of these workloads is our multi-party authorisation system (MPA) for AWS, which we run on AWS Nitro Enclaves. We find that MPA, a system where we require more than one person to undertake a sensitive action, is a very effective way to disallow a single employee from having too elevated privileges. Therefore we have also built MPA to require privileged access to our Kubernetes clusters and are planning to build it to gate access to an increasing number of other systems.

Another example of sensitive workloads are Key Ceremonies. We spoke about them here a few years ago. For easier automation (and for the sake of remote work!), we also run some of these ceremonies in AWS Nitro Enclaves, as opposed to air-gapped offline machines. How we are automating key ceremonies deserves its own blog post, so expect one very soon!

As we build more Nitro Enclaves, we have learnt a lot about how to better secure and automate deployments. We spoke about this briefly in the Reproducible Builds section of this blog post on MPA.

This post describes what we have learnt since then and how we have improved the security of our software supply chain for Nitro Enclaves image files (or EIFs) by building tooling that you can find at github.com/monzo/aws-nitro-util. We also discuss reproducible builds in general and why the process for creating them matters.

What are reproducible builds, and why do you want them?

When we speak of reproducible builds, we mean bit-by-bit identical builds where the same build inputs should deterministically yield the same outputs.

For example, if two people build an artefact (which could be executable binary, a tar or zip archive, etc), each on different machines, we expect that if the build is reproducible we should get the exact same artefact on both machines. We can verify this by hashing the artefacts and comparing the hashes. Hashes always produce the same output for the same input, if the artefacts had changed, then the hashes would have too.

Flow chart depicting two parallel builds, one in an engineer’s machine, and the other in Continuous Integration, where the hashes of the final artefacts are compared.

Reproducible builds are a great way to check whether you can trust an artefact - rather than simply trusting a binary, you can build it yourself to make sure the binary does what you think it does. If some evil hacker had added malicious code to the original, then it would be different from the one you built.

Reproducible builds have more benefits (like consistently reproducing bugs during the build process). If you’d like to read more, the Reproducible Builds project has a great summary.

State of the art: how we were doing reproducible builds

When we implemented multi-party authorisation for AWS, our methodology to achieve the exact same binary twice was the following:

Define build environment as a Dockerfile
- Set hashes in base images
  - like FROM golang:1.21@sha256:405939c...
- Set versions in package installs
  - like RUN yum install -y aws-nitro-enclaves cli-${AWS_NITRO_ENCLAVES_CLI_VERSION}
Define build steps as RUN steps in the Dockerfile, and they produce reproducible artefacts. For example, for building Go, we would use the appropriate flags so that the resulting binary is always the same.

This process would actually yield consistent platform configuration registers (or PCRs, a kind of hash) of Nitro EIFs, as well as consistent hashes for Go binaries and TAR archives.

Shortcomings

Because we could consistently get the same hash across builds, we were initially satisfied with this methodology. But as we used it, we came across the following issues.

Nothing stops build steps inside the Docker build from downloading arbitrary binaries from the internet. We control dependencies’ downloads to some extent (via FROM or RUN yum install) but we cannot control what other commands do. If the Nitro CLI wishes to download stuff off the internet during nitro-cli build-enclave, nothing is stopping it. So if this hypothetical file it downloads in the future changed, a future build could yield a different hash!
It’s hard to find the sources that went into building an image because we don’t specify all of them in the Dockerfile. For example - does our image contain XZ? If so, which version?
While a lot of build dependencies follow from the Dockerfile, some of them are specific to your system. The most obvious one is Docker itself! We noticed this when a change in the Docker API broke our Nitro enclave builds. Even after this was fixed upstream, the fix was not available to us in AWS’ package, because we were not building the Nitro CLI from source as part of the reproducible process.

What we need from reproducible builds

We need to be able to consistently reproduce artefacts to verify that a single engineer has not modified the binary before deploying it, by having someone else also build the binary.

Ideally, the engineers are able to trace the sources that were used during the build. This way, if we used a compromised binary (say, a pre-compiled library we downloaded from the internet) or compromised sources (again, think of the XZ backdoor attack) then we should be able to check. We can call this enumerating our dependencies.

This latter requirement means we need to be able to pin our sources, which means “define which exact version of this dependency we are using”, because different versions of the same dependency use different sources.

In order to pin all of our sources, we must be able to make sure that our build process does not go downloading stuff from the internet without our knowledge— let’s call this offline sandboxing.

To sum that up, when trying to make a build bit-by-bit reproducible, we would like at least:

Dependency enumeration — “these are all of my dependencies”
Dependency pinning — “this is the exact version of this dependency”
Offline sandboxing — “don’t go downloading dependencies behind my back, and keep the build isolated from the OS”

Using Nix to define our builds

You will have noticed that Docker does not actually fit this bill. While a docker build has its own environment, largely decoupled from your machine’s, nothing stops it form speaking to the internet — you can run arbitrary code in it, so it’s easy for the commands you run during a build to include extra code that you don’t know about.

So Docker can’t enumerate dependencies properly, and it can’t sandbox offline to stop your tools from downloading things.

This is not so much a flaw in Docker as a design choice - it just wasn’t built with these goals in mind! For this reason, we switched to a different tool we had not used before at Monzo: Nix.

Nix allows defining builds as ‘derivations’ - a build recipe that clearly states:

What you need to build this — the build inputs
What building this yields — the build outputs
How to transform the inputs into outputs (typically, this is a short shell script)

Nix allows building derivations inside an offline sandbox, giving us the guarantees listed above.

Downloading dependencies from the internet is an option, but only if they are pinned! We ensure pinning by assigning a SHA256 hash to the download in advance.

We can then compose these derivations’ inputs and outputs with each other to chain builds together, a bit like Docker build stages.

🎯 We chose Nix to tackle this problem, but other tools with the same goals include Arch’s PKGBUILD and Guix

Re-thinking how artefacts are assembled

While having a framework that clearly specifies builds and their dependencies helps, the heavy lifting to achieve a reproducible artefact must actually happen at the level of the tool making the artefact. For example, if go build embeds a completely random number inside the binaries it makes, no reproducible build framework is going to save you from that!

In our case, the problematic tool in our build process for assembling enclave image format files (or EIFs) was not the Go compiler, but AWS’ nitro-cli build . For details, Trail of Bits have a great post on how the CLI works and what inputs are needed to build an EIF.

Our problems with the Nitro CLI included (at the time of writing):

It required Docker images as inputs (which as we already covered, they are themselves not reproducible if built with Dockerfiles)
It needs a Docker daemon in order to convert a Docker image into an EIF. This makes building EIFs in non-privileged environments like Concourse CI tricky.
It timestamps EIFs (so that building the exact same EIF twice would yield different file hashes, despite in theory having matching PCRs).

Note how none of these problems are actually inherent to enclaves. They are just specificities of how AWS’ tool chooses to build enclave images.

Our solution to this is doing away with nitro-cli build entirely. Instead, we can use AWS’ libraries to build EIFs directly, and put together the files necessary to build the EIF ourselves. These files include:

The Linux kernel
The initial filesystem, packaged as a CPIO archive, which includes the init process executable and whatever else you want to run inside your enclave, including the Nitro kernel module.

Assembling the CPIO archive is a perfect example for a Nix derivation! It takes the cpio program and the files as inputs, and outputs a .cpio archive.

Putting this together

So far, we have decided to ditch Docker in favour of Nix, and AWS’ Nitro CLI in favour of the library that it wraps - all we need to do now is write the Nix derivations (ie, the build recipes) to put together the .cpio archives (and their contents).

Flowchart depicting each step when building an EIF, from the sources to the final binary

If you are hoping to deploy AWS Nitro Enclaves reproducibly, don’t worry! You won’t need to write these build recipes yourself because we have decided to open-source ours. You can find them at github.com/monzo/aws-nitro-util.

Let’s put in practice this end result. For example when building an EIF twice (some output omitted for brevity):

Code to deploy AWS Nitro Enclaves reproducibly

The hashes match 🎉 if you build this same EIF (for Linux ARM) at Monzo’s aws-nitro-util/examples at commit ca01f8e , you should get the same result (we hope 🤞).

Let’s try enumerating dependencies now. Was XZ used to build this EIF? If so, which version?

We can see we did not use XZ 5.6.0 or 5.6.1 — all good 😉

Conclusion

We decided to introduce Nix into our toolchain so we could leverage its core support for derivations to achieve identical, deterministic builds.

As next steps, and now that we have a reliable build recipe for enclaves, we would like to build more of the enclave completely from source. These include the linux kernel itself or the init process binary, which currently are provided to us by AWS. We are also hoping to improve our continuous deployment around Nix with our own binary cache.

Opportunities at Monzo

At Monzo, we offer a dynamic and collaborative work environment, competitive salaries, and opportunities for career growth and development. If you're interested in joining our team, visit our careers page for more information on our current job openings.