Machine Learning at Monzo in 2021

Last year, I wrote the “Machine Learning @ Monzo in 2020” post as we started a round of hiring for the team. Our hiring season has been underway for some time again, and one of the questions I get from candidates who have seen my previous post is “what has changed, and what has remained the same?”

The high-level structure of Data Science at Monzo has not radically changed: Machine Learning is one of the areas that falls under the general Data Science umbrella, alongside Data Platform Engineering, Analytics Engineering, Product Data Science, Decision Science, and Data Science within specific domains — like Operations and Borrowing.

We also haven’t changed the shape of the job. Folks with the words ‘machine learning’ in their job title spend their time designing, building and analysing systems that use machine learning: we are Type B Data Scientists. Our impact is not measured by our models, but how impactful those models are when put to use.

So, what has changed?

Interest in machine learning has grown across Monzo, bolstered by our work in financial crime

There are many different types of fraud that are used to trick customers of all banks into sending money to fraudsters: being scammed or defrauded can be both a devastating experience for our customers and can be very expensive for Monzo.

In early 2020, around the same time that the world was closing down due to the pandemic, we shipped our first fraud detection model. After several iterations of this model and (critically) by working on how our systems used those model predictions, we were able to make an enormous dent into this problem. This system has recently been nominated for “Outstanding Prevention Initiative” at the 2021 Tackling Economic Crime Awards.

At the end of last year, I also promoted one of our Machine Learning Scientists, Danai, to be our Fincrime Machine Learning Lead — this area has been going from strength to strength since then. We have moved on to applying the same methodological approach to other types of common scams and this remains an active and important piece of our work today.

Fraud detection is a fascinating area from the machine learning perspective: it is dynamic, fast moving, data rich, and highly imbalanced. It pushes us to think pragmatically and creatively, to work closely with domain experts who have seen these scams before, and to stay close to the customer experience. Working in this space has supercharged our usage of shadow deployments, and has brought renewed emphasis on the importance of working end-to-end: accurate models that are not used to trigger the “right” types of interventions are not impactful.

One of the interesting side-effects of shipping so many systems in financial crime (and celebrating their success) is that it triggered other parts of Monzo to start thinking about how they could apply machine learning in their own area: I am having more conversations about how we can apply machine learning into areas of the business that we have historically not worked with before. There are many, and one of the reasons we’re continuing to grow the team is so that we can step into these areas as well.

Machine Learning has tripled in size and changed shape

In late 2020, the team consisted of two Machine Learning Scientists (Danai and Ellie) and myself. We used to alternate between embedding ourselves into Product/ Engineering Squads and inviting others to embed with us to work on specific projects, and we would generally set our own goals.

Since early 2021, we’ve onboarded three new Machine Learning Scientists (Anna, Ibrahim, and Kurt) and welcomed our first Backend Engineer (Charlie) into the team. And we’re continuing to grow! We have four new people who have joined or accepted to join our team in the coming months.

Our team is now at an inflection point: we’re moving away from being a single centralised team, but we’re not yet at the stage where machine learning is its own discipline in the company. The mid-point that we’re using between these two is called a hub-and-spoke model, which is now being adopted by many areas of Data Science at Monzo (including our colleagues in Analytics Engineering).

In a hub-and-spoke model, the spokes are folks who are permanently part of a specific area of the business, and the hub is where we can often work on ideas that are too early to have a permanent Machine Learning Scientist. Currently, we have two people in Financial Crime, two people in Customer Operations, two people in the Hub, and myself. We’re focusing on fraud prevention, risk assessments across different types of bank accounts, NLP for customer support, vulnerability detection, and on automating processes around transactions that customers report problems with.

All new joiners are onboarded in the Hub with a project that requires them to visit the end-to-end lifecycle of a machine learning system within their first 3 months. This can sound fairly daunting! However, we’ve recently used these onboarding projects to ship improvements to our text and image classification systems as well as ship a new model for 3D secure payment challenges.

One of the biggest benefits of this model is that we can set goals within each area and work even more closely with the Product Managers and Engineers in those teams. ML Scientists in spokes will be attending weekly planning and stand-up meetings with their own squads, and then sync back with everyone else so that we can remain aligned and share best practices.

One of the challenges since the beginning of the pandemic has been that we’ve moved to being a primarily distributed team, and have not yet all been in the same room together: we’re trying to overcome this by finding a good balance between synchronous and asynchronous work. We’re also at a stage where all Machine Learning Scientists report into me; this is one of the reasons why we’re starting to hire Machine Learning Managers, who can work closely with the Product, Engineering, Data, and Machine Learning folks in an area of Monzo.

Model governance: managing risk for high impact areas

In the past, many of the areas where we were applying machine learning did not require very strict model governance. For example, when we would ship a new help article ranking model, we could focus on demonstrating the impact it had via an A/B test. As we move into more critical areas, this has started to change.

In the last few months, we’ve been working on a large governance project for a new risk assessment model in the financial crime space, which involves several iterations of review. We’ve learned a lot about what steps we needed to take, and the level of rigour required as our models are reviewed by internal and external auditors. Notably, governance does not replace any of the technical approaches that we use to validate a model’s performance — such as shadow mode deployments — the two work hand in hand to complement one another.

At a distance, governance may sound somewhat dry; however, I see this as another sign that we’re working on truly impactful problems, and also a process that can be optimised itself as we tread this path more than once.

Our machine learning infrastructure is continuing to evolve

The broad way that we ship machine learning has not radically changed: we do a lot of feature engineering and analytics work in our data stack, we train models in the GCP AI Platform, and we ship live inference models as Python micro services in our AWS production stack.

One of the main things that has changed since last year is that we are now using batch machine learning jobs more frequently than before. These are jobs that run predictions over a set of eligible users on a given cadence (such as daily or weekly) and is often the most common pattern of machine learning in the industry. We were not using it before because the types of problems we were solving required live inference (e.g., making a prediction on a transaction), but we are now increasingly working in areas where batch is the right approach. For example, we’ve recently been evaluating a model that aims to detect consumer vulnerability (the Financial Conduct Authority has a paper about the topic).

The second biggest change, which I noted above, is that a Backend Engineer who has been with Monzo for several years (Charlie) has joined the team. Previously, if we had a piece of infrastructure that we needed, we would build it ourselves. Doing this type of work ourselves makes for a difficult decision: when do we build or iterate on infrastructure vs. ship a new model?

For example, at the end of last year I blogged about building a feature store. This system now powers six critical machine learning systems across several areas of Monzo, and Charlie optimised the system to speed up its data ingestion by a factor of x3000. Faster and more reliable data ingestion means fresher features, which means better predictions and opens the door to use our feature store for more use cases: so this optimisation work has a cascading impact.

In the near future, we’re going to revisit some of our other components (e.g. serving models in Python, model artifact management) to seek out similar opportunities. In my view, this is the foundation for a future ML Ops team at Monzo — and it will be critical for these two areas to stay very close to one another.

What are the key takeaways?

Internally, we joke that a year at Monzo is like a decade elsewhere. It’s been difficult to write this post and feel that I’ve done justice to the incredible work that the team has delivered.

We’re a growing team that is becoming more established in key parts of Monzo. The diversity of our team is one of our strengths, and this does not go unnoticed by the company’s leadership. The next phase of our growth involves adding four more Machine Learning Scientists into the mix and scaling the depth and breadth of the machine learning we do across the company.

We continue to avoid “throwing problems over the wall” and so we build systems and iterate on machine learning models while working closely with others instead of thinking about team boundaries. We continue to not be the type of machine learning team that is thinking about pushing forward the state-of-the-art or publishing at academic conferences; we measure our success by the impact we have on customers and Monzo.

If these sort of things excite you, head on over to our careers page!