Every decision at Monzo, starting from fraud detection to customer support to launching in a new market, relies on our data warehouse. We use dbt (data build tool) to define how raw data gets transformed into clean, reliable datasets or “models”. These reusable building blocks power the dashboards, reports, machine learning models, and regulatory submissions that teams across the company depend on.

At Monzo, over 100 independent, empowered teams contribute to our data warehouse of 12,000+ dbt models. The health of data is owned across all these teams. That kind of distributed ownership is powerful, but it's also hard to get right at scale. Additionally, as AI-assisted coding becomes the norm and everyone can contribute to production dbt projects, the question becomes: how do we make sure the outputs are still performant, consistent, and high quality?

Rather than reverting to central ownership, our answer is to make efficient modelling the default through standardised architecture and tooling that sits on top. We’ve introduced clear modelling layers, governed “interfaces” for cross-team sharing, and automated standards enforced directly in CI (continuous integration). Every team can ship seamlessly and with confidence.

Over the past year, we've been rebuilding our data architecture from the ground up based on these principles. As a result, thousands of models have been migrated to the new framework, hundreds of formalised data-sharing interfaces now govern how teams share data, data delivery (landing) times have started to drop significantly, and warehouse cost growth has reversed.

Before: the unconstrained growth era

Monzo's data warehouse grew alongside the company. As we expanded into new markets and launched new products, the number of people contributing models scaled rapidly. For a while, this worked fine. But by the time we hit thousands of models and hundreds of contributors, we noticed a few opportunities to iterate and improve:

Performance optimisation opportunities: warehouse processing costs and data landing times were both on an upwards trajectory. There were opportunities for iterative improvement across the board.
Lack of shared conventions: with full autonomy over their modelling approaches, different teams had developed different conventions, styles and assumptions. The data warehouse had very few centralised guardrails or standards.
Cross-team dependencies: when teams shared data across domains, one team’s modelling choices rippled across the entire warehouse. A schema change here broke downstream models there.

After: three principles, one architecture

Before writing any code, we established three principles:

Be opinionated: Fewer choices lead to better outcomes at scale. Clear data model layer definitions, model types, and naming conventions mean that if you move from Borrowing to Operations, the patterns feel familiar.
Formalise data sharing: Teams inevitably need each other's data. We replaced implicit dependencies with interfaces: explicitly declared, governed models that serve as the single source of truth for shared data.
Automate, don't gatekeep: If tooling enforces best practices by default and CI catches patterns that we want to discourage before they land, reviewers can focus on business logic and design decisions rather than structural compliance.

The layers: object-oriented data modelling

Instead of thinking about models as "staging" or "intermediate", we reframed them as 4 layered business objects:

Landing: flattens raw event payloads into clean timelines per object. Fully automated, no hand-written SQL.
Normalised: the "object-oriented" heart: each model encapsulates a single entity's attributes and lifecycle (SCD Type-2 history, current state, or immutable events). Also generated automatically.
Logical: where we combine normalised objects into richer structures: joining users with accounts, transactions with disputes, or aggregating at the granularity the business needs. This is where most developer effort goes - pure business logic, not data wrangling.
Presentation: lightweight, consumer-shaped models for dashboards, regulatory reports, ML features, or third-party integrations.

Diagram of a one-way data pipeline flowing left to right from raw sources through landing, normalized, logical, and presentation layers. Modelgen automates the landing and normalized layers. Only the normalized and logical layers output interfaces for other teams to consume.

In a warehouse with 100+ contributing teams, sharing data cleanly between domains is one of the hardest problems to solve.

Previously, if you needed another team's data, you'd reference whichever model looked right that you could find - often something internal that could change without warning.

Instead, we introduced interfaces. An interface is a dbt model that a team explicitly declares as a governed contract for cross-team consumption. It's clearly documented, tested, and stable.

Only models in the normalised and logical layers can be declared as interfaces. Presentation-layer models can't - they're leaf nodes built for specific use cases.

Today we have hundreds of formalised interfaces across our migrated models. Practitioners can build on each other's work safely, without worrying about upstream changes they don’t know are coming.

Automating quality at scale

Once we'd redesigned our architecture, the next challenge was making sure every new model followed the same structure, conventions, and best practices - without requiring everyone to become an expert in the architecture, and without creating a central review bottleneck.

We solved this with two complementary systems: Modelgen for automated model generation, and data standards for automated quality enforcement.

Modelgen: declare once, generate everything

Modelgen is a command-line tool we built in-house. Instead of manually writing hundreds of lines of SQL and YAML, data practitioners describe an object once - its key identifiers, source events, and important fields - in a short YAML config.

This was only possible because of how Monzo's data ingestion from the backend is designed: our microservices emit events in a consistent, well-structured format. That uniformity lets us make strong assumptions about what raw data looks like before it ever reaches the warehouse, and generate correct, performant dbt models from a minimal description.

YAML code snippet for model generation configuring an account object. It defines source events for account creation and closure, and maps columns using the SQL JSON_VALUE function to extract data from a payload.

Data standards: embedding best practices into CI/CD

Modelgen ensures models are structured correctly. Our data standards framework ensures they're built correctly - defining what "good" looks like and enforcing it automatically.

Among many other rules, we require that every model must have a unique key (a field that can be used to identify each individual row) and freshness tests (alerts when data stops arriving on time), be incremental (so models only process new or changed data) unless explicitly exempt, have an explicit owner team and good documentation, and follow naming and metadata conventions.

These aren't just guidelines in our wiki. They're checks that run in CI on every pull request. New models are compliant from day one. Existing models are continuously monitored.

Automated GitHub bot comment flagging three blocking data standard failures for a dbt model: missing unique key, inadequate column descriptions, and missing partition config. Each flagged error includes a link to documentation guidance.

Where exceptions are necessary, they’re explicitly declared and justified - so governance stays fast without becoming brittle.

Results so far

We are currently about 30% through a company-wide migration to using these approaches and systems, with a long road ahead of us. Initial results have been encouraging. We’ve seen ~40% cost reduction and ~25% faster landing times in some domains - but it’s early days still.

We’re learning as we go and iterating, so we’ll be looking to share our takeaways and thoughts once we get closer to the finish line!

If you're building a data platform and wrestling with similar challenges - scaling distributed ownership, enforcing quality without bottlenecks, or designing for a future where anyone can contribute to data projects - we'd love to hear how you're approaching it.

Interested in a career at Monzo?

If what you’ve read here resonates and you’re passionate about making money work for everyone, we’re hiring data engineers, analysts, and many more roles across Monzo! Take a look at our careers page to see if we have the right role for you.