How we've started to measure 'software excellence'

In the Backend Platform team at Monzo, one of our goals is to make sure that the software we build and maintain across the company is high quality. We want to empower our engineers to create the best software possible. Great quality software means the best possible service for our users and allows us to stay nimble.

As a product grows, keeping tabs on software quality becomes more difficult. There are more places for bad practices to hide and finding issues becomes more of a challenge. We thought about how we could tackle this problem and came up with a list of goals:

We want to have better overall visibility of our software quality -- a lack of visibility into the current state of affairs makes it hard to prioritise improvements
We want to remove subjectivity from what constitutes quality -- different people having different ideas can lead to inconsistencies, so we want to standardise this somehow
We want to make software quality work feel fun -- often the work to improve quality does not feel rewarding
We want good metrics about how the quality of our software is trending over time
Importantly, we want to improve software quality across the organisation, accounting for the constantly evolving definition of what "quality" means in software

So we've tried to build a solution. We call it Software Excellence, and it's designed to track and monitor software quality across the company, surfacing the places where we can make improvements and making the overall process of raising software quality quantifiable and fun.

This is our first attempt to do something like this, and there's still plenty of room for improvements. This post is all about how far we've got. We're going to keep iterating as we learn more. 😄

Assessing software quality in our platform

From the very start, we built our backend as a collection of distributed microservices. One good way of assessing overall software quality is to make sure that each individual microservice meets our standards. If we somehow had a report describing the "software excellence" of each microservice, we would have a clear idea of where to direct our engineering efforts.

But this is challenging because there are now over 1,800 microservices that make up Monzo. For a sense of scale, here's an image we shared in a previous post back when we had a smaller 1,500 services. Each service is represented by a dot, and every line is an enforced network rule.

A graph showing 1500 microservices in our architecture and the connections between them.

So:

Ownership of the microservices that make up our backend is spread across many teams within Monzo
Manually assessing the code of each microservice for quality would be slow, difficult, and error prone
We want to be proactive in promoting good software engineering practices across the company by continuously monitoring quality
We want to make sure that everybody is following the same standards

With that in mind, our Software Excellence system is built as an extensible platform for automatically measuring quality of microservices at Monzo and surfacing that to owners. We're addressing the challenges above because:

The owner of each microservice is responsible for its quality
Software quality measurements are automated
Software quality is measured continuously, and reacts quickly to changes

Designing a system for tracking quality

Our first step was to build a software catalogue. Only by having a canonical list of the components in our platform could we begin to assess quality. We did this by creating descriptor files for each microservice. The descriptors hold metadata like the owner of the microservice, a description of what it does, and which business function it's used for. We automated as much of this as we possibly could, but some things (like writing a description for the purpose of each microservice) had to be done manually.

We then started to think about how we wanted Software Excellence to work. Here's what we came up with!

For each of the microservices we have at Monzo, the Software Excellence system should assign a grade representing software quality. The system should also compute grades for various relevant categories, such as documentation quality, test quality, and observability. These categories should be able to morph and change over time as we continue to evolve our understanding of software quality at Monzo.

For grades, we didn't want to show engineers a number representing their score, because we were concerned that it would encourage overfitting to the standards. We settled on the following possible grades:

Needs Improvement
Poor
Okay
Good
Excellent

As well as presenting grade information, we wanted to present service owners with a customised, actionable set of tasks that they can carry out to improve their grade. This would give engineers a detailed view of the current state of their software quality and an awareness of what they have to do to make it better.

It's really important that our services clear the bar for high quality software, and even more so for our most critical services. For example, we need to be confident that the code behind service.ledger (which sits at the bottom of the core banking architecture and is responsible for maintaining accurate financial information) is great quality. For this reason, we also introduced the idea of service tiering, where services are assigned a tier representing their criticality. Tier 0 services are the most business-critical, and tier 3 services are the least, with tiers 1 and 2 being intermediates in between.

Naturally, we are most interested in improving the software quality of our tier 0 services first, and Software Excellence allows us to filter our scope as needed.

What our engineers see

Engineers at Monzo interact with the Software Excellence system via a web-app frontend, which is powered by the brilliant Backstage, an open platform for building developer portals from Spotify.

The landing page is a catalogue of microservices at Monzo, giving a birds-eye view of our platform. The catalogue can be filtered by various criteria, which is useful for a team looking to see how their services are doing.

A screenshot of Backstage showing some components owned by the Backend Platform Squad.

Selecting a service leads you to the component view, where you can see more information about a microservice. This is what an engineer would see looking at the service.acceptance-tests service in the component view.

A screenshot of Backstage showing a single component and its excellence grade.

Clicking on the "Excellence" tab within the component page shows the Software Excellence score for the service.

A screenshot showing the overall excellence 'grade' for an individual component.

Scrolling down shows the tasks required to get to the next grade. Clicking an incomplete action expands to show more details:

A screenshot of a component's grade and the actions required to improve it.

We also have a Software Excellence Looker dashboard for a holistic view of how our platform is doing. For example, we can see the number of entities by dimension and grade:

A screenshot of our Looker dashboard where we track how each excellence entity is trending over time.

The backend architecture of Software Excellence

Most engineers interact with Software Excellence through Backstage, but we wanted it to be accessible as an API too so that we can integrate it into other applications easily. For example, we could (and hopefully will!) implement a Slack bot that pings a channel if excellence scores start slipping. To facilitate this, we decided to implement Software Excellence as a collection of microservices.

Underpinning everything is service.software-facts, which collects and records facts about microservices on our platform (a process called "fact collection") and offers an RPC-based API for fetching them. By "fact", we're talking about a measurable property of a microservice or system. For example, a fact might be the number of alerts configured, a score for the readability of the documentation, or a score for the service's test coverage. They are proxies for software quality, and a big challenge of implementing the system was deciding which measurable properties about our platform were useful for inferring software quality.

The fact collection process is invoked periodically via a cron job, and can be triggered as part of our CI pipeline or via Firehose (our NSQ-based messaging system). Facts are updated quickly following code changes.

Downstream, the facts provided by service.software-facts are the foundation for the Software Excellence scores we award each service. The service responsible for creating these scores is service.software-excellence. It's another RPC-based API which gets fact information from service.software-facts and passes the fact information through an algorithm for calculating grades and actions.

To make it easy to interact with the Software Excellence service from a web-app, we also provide service.iapi.software-excellence, which is a thin wrapper around service.software-excellence that provides a GraphQL frontend.

Last but not least, we build our web-app for Software Excellence on top of Backstage. Internally, this is named web.backstage.

The following diagram illustrates the architecture:

A diagram showing the overall architecture of our Software Excellence system.

How we use Software Excellence

The Software Excellence system has been a valuable tool for improving software quality at Monzo. We use the Software Excellence system to set quality-related goals for ourselves -- goals which have accountability, are actionable, and are quantifiable.

What's next for Software Excellence?

No system is perfect straight away, including Software Excellence. That's why we're taking an iterative approach here. We've asked our engineering squads what they think can done better, and we're aware of a few pain points with the current implementation.

For a start, Software Excellence doesn't track everything yet, and we want to expand its remit to cover all types of components. Microservices are a big part of our platform, but we also want to extend the same care and attention to our web apps, infrastructure, and cron services, and so on.

Another thing that we spotted is that facts are proxies for software quality, and sometimes they have a low signal-to-noise ratio. There are a few facts in our current system that we're not really sure about the usefulness of. For example, one of our facts concerns how much documentation a service has, and we assign a higher excellence score to services with more docs. After seeing how this works in practice, we're a bit skeptical. Having documentation is good, but having more documentation isn't always better. Quality is better than quantity.

As a result, we're reconsidering the usefulness of some of our facts. We're talking to our engineers to figure out a better fact set that is a more optimised gauge for quality.

We also realised is some facts are just not relevant to particular services. This happens for various reasons. For example, very thin API wrapper services that delegate to a backend service often just aren't complex enough to warrant a lot of the actions suggested by Software Excellence. This can be frustrating for engineers when their Software Excellence grade is docked because of a fact that's not relevant. To address this, we've added individual overrides for facts so that they can be turned off on a case-by-case-basis. We trust that our engineers know best.

If designing and creating systems like this sounds interesting, why not check out some of our open roles? We're continually looking for engineering talent, so get in touch. 🚀