Skip to main content
AI Hub
Home Browse Cases Countries Sources Explore Taxonomy About Submit
Sign In
DCI AI Hub — AI Tracker socialprotectionai.org/use-case/GBR-004
GBR-004 Exported 1 April 2026

UKMOD-Essex — ML-Enhanced Regional Tax-Benefit Microsimulation

Country United Kingdom
Deployment Status Pilot / Controlled Trial Phase
Confidence Confirmed
Implementing Agency Centre for Microsimulation and Policy Analysis (CeMPA), Institute for Social and Economic Research (ISER), University of Essex; in collaboration with Essex County Council (lead author Frimpong is affiliated with both)

Overview

UKMOD-Essex is a machine learning-enhanced extension of UKMOD — the United Kingdom's only fully open-access tax-benefit microsimulation model — developed by the Centre for Microsimulation and Policy Analysis (CeMPA) at the University of Essex in collaboration with Essex County Council. The system uses a Gradient Boosted Machine (GBM) algorithm, likely implemented in XGBoost based on tree output format characteristics, to solve a fundamental limitation in national survey-based policy modelling: the Family Resources Survey (FRS), which underpins UKMOD, is only statistically representative at the Government Office Region (GOR) level, making sub-national policy analysis at the local authority or county level unreliable.

The GBM operates as a propensity score estimator within a three-stage hybrid pipeline. In the first stage, the algorithm is trained on a merged dataset combining the national FRS (53,577 individuals in 25,045 households for the whole UK) with commercially available household-level data from Experian (1,861,043 individuals in 738,993 households for Greater Essex, covering practically 100% of the population at postcode level). The GBM uses 12 core covariates — including categorised age, tenure type, household size, presence and number of children by age band, labour market activity status, and equivalised income (residualised to prevent it disrupting covariate balance) — plus 5 interaction terms capturing complex relationships such as age-by-retirement-status and children-by-household-size. The model estimates a propensity score between 0 and 1 for each household, representing the probability of belonging to the regional Experian dataset versus the national FRS.

In the second stage, these propensity scores are converted to Inverse Probability Weights (IPW), which are stabilised and capped at the 99th percentile with the top 1% trimmed, and nearest-neighbour matching with caliper restriction is applied. In the third stage, Iterative Proportional Fitting (IPF/raking) calibrates the weights against official ONS population statistics for Greater Essex (1,841,192 individuals in 771,189 households across 14 districts including the unitary authorities of Southend-on-Sea and Thurrock) to ensure marginal distributions match for age groups, employment status, and household composition.

The GBM was chosen over Random Forest after testing both approaches: GBM achieved lower Standardized Mean Differences (SMDs) across all covariates, handled 10+ socioeconomic predictors without degradation, and captured complex multi-way interactions more effectively. Random Forest's adjusted covariate balance was actually worse than the unadjusted baseline when using more than 6 covariates.

The resulting reweighted dataset enables UKMOD's standard rules-based tax-benefit simulation engine to produce regional estimates of employment income distributions, tax liabilities, benefit entitlements, and distributional impacts of policy reforms at the Essex level. Macro-validation against external benchmarks shows strong alignment: median monthly employment income of GBP 2,392 (UKMOD-Essex) versus GBP 2,535 (ASHE), and self-employment income matching the Survey of Personal Incomes benchmark exactly at GBP 3.20 billion when filtered to comparable definitions.

The system is part of the EUROMOD family of models jointly developed with the European Commission. UKMOD is released under a CC BY-NC-ND 4.0 license (free, non-commercial), and the EUROMOD software engine is open-source under the EUPL-1.2 licence. The lead author, Rejoice Frimpong, is affiliated with both Essex County Council and CeMPA, confirming direct local government involvement in the development. The methodology is described in CeMPA Working Paper 9/25 (August 2025), and the authors note future directions including neural networks, XGBoost variants, hybrid ensemble models, and application to dynamic (not just static) microsimulation.

Classification

AI Capabilities

Synthetic dataset generation (primary)Classification

Use Cases

Policy analysis, learning and M&E (primary)

Social Protection Functions

Policy: Legal and policy frameworks (primary)Programme design: Benefits and service package
SP Pillar (Primary)Social assistance
SP Pillar (Secondary)Social insurance

Programme Details

Programme NameUKMOD (UK Tax-Benefit Microsimulation Model)
Programme TypeOther
System LevelPolicy

UKMOD is the UK's only fully open-access tax-benefit microsimulation model, covering all four nations (England, Scotland, Wales, Northern Ireland). Part of the EUROMOD family jointly developed with the European Commission. Simulates effects of taxes and social benefits on household incomes and work incentives. Both model and underlying data are freely available. Online version (UKMOD Explore) allows non-specialists to design and run policy scenarios.

Implementation Details

Implementation TypeClassical ML
Lifecycle StageIntegration and Deployment
Model ProvenanceDeveloped in-house
Compute EnvironmentOn-premise
Sovereignty QuadrantI — Sovereign AI Zone
Data ResidencyDomestic
Cross-Border TransferNone

Risk & Oversight

Decision CriticalityLow
Human OversightHITL
Development ProcessFully in-house
Highest Risk CategoryModel-related risks
Risk Assessment StatusInformal assessment

Risk Dimensions

Data-related risks

Data or concept driftData quality failure

Model-related risks

Model misspecificationOpacity or limited explainability

Impact Dimensions

Autonomy, human dignity and due process

Opaque or unexplained decision

Equality, non-discrimination, fairness and inclusion

Reinforcement of structural inequity

Safeguards

Data minimisation controlsIndependent evaluation

Deployment & Outcomes

Deployment StatusPilot / Controlled Trial Phase
Year Initiated2025
Scale / CoverageGreater Essex region — 1,841,192 individuals in 771,189 households (ONS, March 2023). Covers 14 Greater Essex districts including Southend-on-Sea and Thurrock unitary authorities. National FRS input: 53,577 individuals in 25,045 households. Experian regional data: 1,861,043 individuals in 738,993 households.
Funding SourceUniversity of Essex / CeMPA (ESRC-funded centre). EUROMOD engine jointly funded with European Commission.
Technical PartnersAlliance for Microsimulation and Policy Analysis CIC (co-developer of UKMOD); Experian (commercial regional data provider); EUROMOD software engine (open-source, EUPL-1.2 license)

Outcomes / Results

Macro-validation shows strong alignment with external benchmarks. Median monthly employment income: GBP 2,392 (UKMOD-Essex) vs GBP 2,535 (ASHE) — within expected range given different data sources/definitions. Self-employment income matches SPI benchmark exactly when filtered to comparable definitions (GBP 3.20 billion). Post-weighting Standardized Mean Differences below 0.1 for most variables. GBM outperformed Random Forest on covariate balance diagnostics.

Challenges

GBM training is computationally expensive. ML introduces complexity in model selection, overfitting prevention, and interpretability. Performance depends on quality of Experian data — may not replicate in regions with sparse or inconsistent commercial data. Post-weighting raking still needed, indicating GBM alone cannot capture all dimensions of population heterogeneity.

Sources

  1. SRC-003-GBR-004 CeMPA (n.d.) 'UKMOD', Centre for Microsimulation and Policy Analysis, University of Essex.
    https://www.microsimulation.ac.uk/ukmod/
  2. SRC-001-GBR-004 Frimpong, R. & Richiardi, M. (2025) 'Machine learning regionalisation of input data for microsimulation models: An application of a hybrid GBM / IPF method to build a tax-benefit model for the Essex region in the UK', CeMPA Working Paper 9/25, University of Essex.
    https://www.iser.essex.ac.uk/wp-content/uploads/files/working-papers/cempa/cempa9-25.pdf
  3. SRC-002-GBR-004 Richiardi, M., Collado, D. & Popova, D. (2021) 'UKMOD – A new tax-benefit model for the four nations of the UK', International Journal of Microsimulation, 14(1), pp. 92-108.
    https://microsimulation.pub/articles/00243

How to Cite

DCI AI Hub (2026). 'UKMOD-Essex — ML-Enhanced Regional Tax-Benefit Microsimulation', AI Hub AI Tracker, case GBR-004. Digital Convergence Initiative. Available at: https://socialprotectionai.org/use-case/GBR-004

Back to case page
AI Hub

Digital Convergence Initiative - AI Hub

Responsible, ethical use of AI in social protection

MarketImpact Platform developed by MarketImpact Digital Solutions
Co-funded by European Union and German Cooperation. Coordinated by GIZ, ILO, The World Bank, Expertise France, and FIAP.