Case study · Banking & financial services
A governed data foundation for risk reporting and customer analytics
Core banking, CRM, and risk systems unified under one semantic layer—so regulators, risk officers, and product teams drew from the same source of truth.
A retail bank with operations across multiple markets had accumulated years of point-to-point integrations between its core banking system, CRM, and loan management platform. Risk reporting relied on manually reconciled extracts; regulatory reviewers had raised findings on data lineage; and product and AML teams worked from different versions of the customer record. Leadership needed one trusted data foundation—not another reporting tool layered on top of fragmented pipes.
What leadership was trying to fix
Risk, finance, and compliance on different numbers
At the close of each reporting period, risk officers and the finance team would spend several days reconciling exposure figures, balance sheet positions, and customer counts before presenting to the board and regulators. The root cause was consistent: customer and account identifiers were not shared across systems, so joins were approximated, and every team applied its own business rules to the same underlying transactions.
Regulatory examiners had noted gaps in data lineage—the bank could not trace a reported credit exposure figure back to the originating transactions in the core banking system without manual intervention. Internal audit had flagged the same risk. The question was no longer whether to address it but how to do so without disrupting ongoing operations.
Friction in the data estate
Siloed systems, duplicated identifiers, and brittle batch jobs
The core banking system (CBS) managed account and position data at an account-level identifier. The CRM held customer-level records with its own contact IDs. The loan origination system used a third key scheme inherited from an earlier platform migration. No canonical customer master existed; analysts maintained mapping tables in spreadsheets that went stale when customers changed names, merged accounts, or were onboarded through different channels.
Nightly batch jobs extracted flat files from the CBS and loaded them into a staging schema in the data warehouse. Transformations were embedded in stored procedures with no test coverage and no lineage metadata. When a pipeline failed at 2 AM, the on-call team had no reliable way to assess downstream impact before the reporting window opened. AML transaction monitoring consumed a separate feed with a different refresh cadence, so the customer profile seen by the fraud team sometimes differed from what the relationship manager saw in CRM.
Design of the response
Medallion architecture with a conformed customer master at the centre
We implemented a medallion-style data platform on the bank’s cloud analytics stack: Bronze for raw, replayable landing of all source feeds with arrival timestamps and immutable storage; Silver for conformed and validated entities; Gold for curated marts serving specific consumers.
The foundation of Silver was the customer golden record. We built a deterministic entity resolution process that matched CBS account-holder records, CRM contacts, and loan system borrowers using a hierarchy of match rules—national ID, date of birth plus name, and address normalisation—with explicit confidence scoring and a quarantine queue for low-confidence matches requiring analyst review. The output was a stable, bank-wide customer key (BankCID) propagated across all downstream entities: accounts, positions, transactions, and products.
With consistent keys in place, Silver conformed the following entity families:
- Customer and counterparty: individuals, corporates, and internal legal entities with relationship hierarchies for group exposure aggregation.
- Account and product: current accounts, savings, loans, and credit facilities aligned to a product taxonomy shared by finance and risk.
- Position and balance: end-of-day balances and intraday snapshots where source systems supported them; foreign currency positions restated to a reporting currency using a managed rate table.
- Transaction: debit and credit events enriched with product and counterparty context; hash-keyed for idempotent reloads.
Gold served three distinct consumer groups with dedicated marts: a credit risk mart for regulatory exposure aggregation aligned with BCBS 239 principles (single source of truth, completeness checks, lineage to Bronze); a customer 360 mart for AML transaction monitoring and relationship-level product analytics; and a finance and balance sheet mart for P&L attribution, net interest margin, and balance sheet reconciliation against the general ledger.
Data quality rules were concentrated where errors had the highest cost: account-to-customer linkage completeness, position-to-ledger reconciliation tolerances, and reference data alignment for product codes and country classifications.
How we ran delivery
Source-by-source onboarding with end-to-end verification
We onboarded sources in prioritised waves rather than attempting a single cutover. The CBS was first because it anchored account and position data; the CRM followed once the entity resolution process was validated against a representative sample reviewed by the data stewardship team. The loan system was third, after reconciling its historical ID mappings.
Each wave followed a consistent pattern: data contracts documented at source boundaries; automated quality checks at Bronze ingestion, Silver transformation, and Gold publication; reconciliation reports comparing Gold aggregates to the authoritative source system for a defined validation window; and signed-off runbooks before the wave went live in production.
Lineage was wired end-to-end from Gold metrics back to Bronze landing records, enabling the risk team to answer a regulator’s lineage question by navigating the metadata catalogue rather than manually tracing SQL chains. We handed the operations team ownership of each pipeline before moving to the next wave—preventing the pattern where a new platform is delivered but only the implementers know how it works.
Impact
Regulatory-grade lineage and a reconciliation process the team could own
Risk officers could trace any reported exposure figure to the originating CBS records through a documented lineage chain—a direct response to the earlier regulatory finding. The days of manual pre-close reconciliation shortened significantly as Gold mart outputs aligned to the CBS and general ledger within agreed tolerances automatically, surfacing exceptions rather than requiring full manual review.
The AML transaction monitoring team and the product analytics team drew from the same customer and transaction entities for the first time, eliminating the divergence in customer counts and risk flags that had previously required weekly alignment calls. The finance team retired several Excel-based bridging extracts and the associated correction risk at period close.
The conformed customer master (BankCID) became an infrastructure asset the bank could build on: onboarding new analytics use cases—liquidity stress testing, marketing propensity models, regulatory reporting extensions—required connecting a new consumer to an already-verified entity set rather than re-solving the identity problem from scratch.
Specific figures are client-internal; the directional outcome was a reporting foundation the bank’s data governance function could attest to, and a platform team that could operate and extend it independently.
Start a conversation
We typically respond within one business day. Submissions post securely; you can also add detail here if you used the request form above.
Your information is confidential and never shared.
