UKMOD-Essex — ML-Enhanced Regional Tax-Benefit Microsimulation
Overview
UKMOD-Essex is a machine learning-enhanced extension of UKMOD — the United Kingdom's only fully open-access tax-benefit microsimulation model — developed by the Centre for Microsimulation and Policy Analysis (CeMPA) at the University of Essex in collaboration with Essex County Council. The system uses a Gradient Boosted Machine (GBM) algorithm, likely implemented in XGBoost based on tree output format characteristics, to solve a fundamental limitation in national survey-based policy modelling: the Family Resources Survey (FRS), which underpins UKMOD, is only statistically representative at the Government Office Region (GOR) level, making sub-national policy analysis at the local authority or county level unreliable.
The GBM operates as a propensity score estimator within a three-stage hybrid pipeline. In the first stage, the algorithm is trained on a merged dataset combining the national FRS (53,577 individuals in 25,045 households for the whole UK) with commercially available household-level data from Experian (1,861,043 individuals in 738,993 households for Greater Essex, covering practically 100% of the population at postcode level). The GBM uses 12 core covariates — including categorised age, tenure type, household size, presence and number of children by age band, labour market activity status, and equivalised income (residualised to prevent it disrupting covariate balance) — plus 5 interaction terms capturing complex relationships such as age-by-retirement-status and children-by-household-size. The model estimates a propensity score between 0 and 1 for each household, representing the probability of belonging to the regional Experian dataset versus the national FRS.
In the second stage, these propensity scores are converted to Inverse Probability Weights (IPW), which are stabilised and capped at the 99th percentile with the top 1% trimmed, and nearest-neighbour matching with caliper restriction is applied. In the third stage, Iterative Proportional Fitting (IPF/raking) calibrates the weights against official ONS population statistics for Greater Essex (1,841,192 individuals in 771,189 households across 14 districts including the unitary authorities of Southend-on-Sea and Thurrock) to ensure marginal distributions match for age groups, employment status, and household composition.
The GBM was chosen over Random Forest after testing both approaches: GBM achieved lower Standardized Mean Differences (SMDs) across all covariates, handled 10+ socioeconomic predictors without degradation, and captured complex multi-way interactions more effectively. Random Forest's adjusted covariate balance was actually worse than the unadjusted baseline when using more than 6 covariates.
The resulting reweighted dataset enables UKMOD's standard rules-based tax-benefit simulation engine to produce regional estimates of employment income distributions, tax liabilities, benefit entitlements, and distributional impacts of policy reforms at the Essex level. Macro-validation against external benchmarks shows strong alignment: median monthly employment income of GBP 2,392 (UKMOD-Essex) versus GBP 2,535 (ASHE), and self-employment income matching the Survey of Personal Incomes benchmark exactly at GBP 3.20 billion when filtered to comparable definitions.
The system is part of the EUROMOD family of models jointly developed with the European Commission. UKMOD is released under a CC BY-NC-ND 4.0 license (free, non-commercial), and the EUROMOD software engine is open-source under the EUPL-1.2 licence. The lead author, Rejoice Frimpong, is affiliated with both Essex County Council and CeMPA, confirming direct local government involvement in the development. The methodology is described in CeMPA Working Paper 9/25 (August 2025), and the authors note future directions including neural networks, XGBoost variants, hybrid ensemble models, and application to dynamic (not just static) microsimulation.
Classification
AI Capabilities
Use Cases
Social Protection Functions
| SP Pillar (Primary) | Social assistance |
| SP Pillar (Secondary) | Social insurance |
Programme Details
| Programme Name | UKMOD (UK Tax-Benefit Microsimulation Model) |
| Programme Type | Other |
| System Level | Policy |
UKMOD is the UK's only fully open-access tax-benefit microsimulation model, covering all four nations (England, Scotland, Wales, Northern Ireland). Part of the EUROMOD family jointly developed with the European Commission. Simulates effects of taxes and social benefits on household incomes and work incentives. Both model and underlying data are freely available. Online version (UKMOD Explore) allows non-specialists to design and run policy scenarios.
Implementation Details
| Implementation Type | Classical ML |
| Lifecycle Stage | Integration and Deployment |
| Model Provenance | Developed in-house |
| Compute Environment | On-premise |
| Sovereignty Quadrant | I — Sovereign AI Zone |
| Data Residency | Domestic |
| Cross-Border Transfer | None |
Risk & Oversight
| Decision Criticality | Low |
| Human Oversight | HITL |
| Development Process | Fully in-house |
| Highest Risk Category | Model-related risks |
| Risk Assessment Status | Informal assessment |
Risk Dimensions
Data-related risks
Model-related risks
Impact Dimensions
Autonomy, human dignity and due process
Equality, non-discrimination, fairness and inclusion
Safeguards
Deployment & Outcomes
| Deployment Status | Pilot / Controlled Trial Phase |
| Year Initiated | 2025 |
| Scale / Coverage | Greater Essex region — 1,841,192 individuals in 771,189 households (ONS, March 2023). Covers 14 Greater Essex districts including Southend-on-Sea and Thurrock unitary authorities. National FRS input: 53,577 individuals in 25,045 households. Experian regional data: 1,861,043 individuals in 738,993 households. |
| Funding Source | University of Essex / CeMPA (ESRC-funded centre). EUROMOD engine jointly funded with European Commission. |
| Technical Partners | Alliance for Microsimulation and Policy Analysis CIC (co-developer of UKMOD); Experian (commercial regional data provider); EUROMOD software engine (open-source, EUPL-1.2 license) |
Outcomes / Results
Macro-validation shows strong alignment with external benchmarks. Median monthly employment income: GBP 2,392 (UKMOD-Essex) vs GBP 2,535 (ASHE) — within expected range given different data sources/definitions. Self-employment income matches SPI benchmark exactly when filtered to comparable definitions (GBP 3.20 billion). Post-weighting Standardized Mean Differences below 0.1 for most variables. GBM outperformed Random Forest on covariate balance diagnostics.
Challenges
GBM training is computationally expensive. ML introduces complexity in model selection, overfitting prevention, and interpretability. Performance depends on quality of Experian data — may not replicate in regions with sparse or inconsistent commercial data. Post-weighting raking still needed, indicating GBM alone cannot capture all dimensions of population heterogeneity.
Sources
- SRC-003-GBR-004 CeMPA (n.d.) 'UKMOD', Centre for Microsimulation and Policy Analysis, University of Essex.
https://www.microsimulation.ac.uk/ukmod/ - SRC-001-GBR-004 Frimpong, R. & Richiardi, M. (2025) 'Machine learning regionalisation of input data for microsimulation models: An application of a hybrid GBM / IPF method to build a tax-benefit model for the Essex region in the UK', CeMPA Working Paper 9/25, University of Essex.
https://www.iser.essex.ac.uk/wp-content/uploads/files/working-papers/cempa/cempa9-25.pdf - SRC-002-GBR-004 Richiardi, M., Collado, D. & Popova, D. (2021) 'UKMOD – A new tax-benefit model for the four nations of the UK', International Journal of Microsimulation, 14(1), pp. 92-108.
https://microsimulation.pub/articles/00243
How to Cite
DCI AI Hub (2026). 'UKMOD-Essex — ML-Enhanced Regional Tax-Benefit Microsimulation', AI Hub AI Tracker, case GBR-004. Digital Convergence Initiative. Available at: https://socialprotectionai.org/use-case/GBR-004