Machine-Learning Early Warning System for Income Support Recipients (Australia — Research Prototype)
Overview
The Machine-Learning Predictive Recertification Targeting system is a research prototype developed by Dario Sansone of the University of Exeter Business School and Anna Zhu of RMIT University, using Australian government administrative data to predict the intensity and duration of income support receipt among welfare enrollees in the Centrelink social security system. The system was designed to forecast the proportion of time each individual would remain on income support over a subsequent four-year horizon, with the explicit aim of identifying individuals at highest risk of long-term welfare dependency so that early intervention programmes and recertification review processes could be targeted more effectively (Sansone and Zhu, 2021, IZA DP 14377, p. 1).
The research uses the DOMINO (Data Over Multiple Individual Occurrences) longitudinal administrative dataset, which is maintained by the Australian Department of Social Services and captures individuals' interactions with the welfare system without identifiable information such as names and addresses (DSS Aristotle Metadata Registry). DOMINO contains daily-frequency records of income support receipt status from 2000 onwards, covering over 32 million persons who had any contact with the Centrelink system during that period (Sansone and Zhu, 2021, p. 5). The data are high quality because the government relies on this exact information to determine eligibility for payments: an individual's payment amount is a direct function of their income, wealth, savings, household structure, and other socio-economic factors, and these data are reconciled with Australian Tax Office records to ensure accuracy (Sansone and Zhu, 2021, p. 3). Recipients' eligibility for payments is assessed regularly, and recipients are required to report changes such as to relationship status, earnings, or living conditions within 14 days of the change (Sansone and Zhu, 2021, p. 10). The dataset includes information on demographics (sex, age, country of birth, and Indigenous status), household structure, government benefit receipt history by type, personal relationships, employment and underemployment, work instability, location and residential mobility, housing, education, income, and wealth (Sansone and Zhu, 2021, p. 13). In total, approximately 1,800 possible predictive features were constructed from these administrative records (Sansone and Zhu, 2021, p. 15).
The research was funded through Australian Research Council Linkage Project LP170100472 (Sansone and Zhu, 2021, acknowledgements footnote, p. 3). The analytical sample covers the period 2014 to 2018, using 2014 as the base year for predictive features and measuring welfare receipt intensity from 2015 to 2018. A 1% random sample of approximately 50,615 individuals aged 15 to 66 was drawn from the full population for computational reasons (Sansone and Zhu, 2021, p. 10-11).
The technical approach uses an ensemble of off-the-shelf classical machine-learning algorithms: LASSO (a regularised regression method), Support Vector Regression, and Boosting (gradient-boosted trees allowing up to 6-way interactions between input variables). The data were split into an 80% training sample and a 20% hold-out test sample for out-of-sample performance evaluation (Sansone and Zhu, 2021, pp. 15-16). The ensemble method, which combines predictions from all three algorithms using weighted linear regression, achieved the best performance overall (Sansone and Zhu, 2021, p. 18).
In terms of performance, the machine-learning ensemble achieved an out-of-sample R-squared exceeding 76%, representing at least a 22% improvement (approximately 14-percentage-point increase in R-squared) compared to the best-performing OLS heuristic model and standard early warning systems currently in use (Sansone and Zhu, 2021, p. 18; University of Exeter, 2021). The authors conducted back-of-the-envelope calculations showing that individuals identified by the ML model as long-term recipients accrued an additional welfare cost of approximately AUD 0.99 billion compared with comparably sized groups identified under the existing actuarial profiling approach used in the government's Try, Test and Learn programme, representing roughly 10% of total annual unemployment benefit expenditure (Sansone and Zhu, 2021, p. 18). The ML algorithms also identified new powerful predictors not commonly associated with long-term welfare receipt, including annual income variability, residential relocation frequency, and failure to meet mutual obligation criteria (Austaxpolicy, 2021).
The relevance to recertification and exit decisions lies in the system's ability to predict which individuals are most likely to remain on income support for extended periods, thereby enabling targeted recertification scheduling and resource allocation for exit-focused interventions. The paper explicitly notes that Australia's income support payments are strictly means-tested with regular eligibility assessment, and that recipients who fail to comply with mutual obligation requirements such as activity tests and job search can face sanctions including loss of payments (Sansone and Zhu, 2021, pp. 7-9). The ML predictions could inform which recipients receive more intensive casework review and which can be managed with lighter-touch recertification processes.
The human oversight model envisaged by the researchers is explicitly advisory and complementary to caseworker expertise. The authors state that the algorithms should not replace human expertise but rather act as its complement, allowing caseworkers to focus their attention and time providing personalised service and targeting appropriate support to individuals that the algorithm identifies as most at risk (University of Exeter, 2021; IZA Newsroom, 2021). The authors also advocate for a system to monitor and audit automated decision-making, referencing the Australian Robodebt scandal as a cautionary example of the potential harms from automated welfare systems (Austaxpolicy, 2021). The predictive models can reduce conscious and unconscious biases common in human decision-making by avoiding arbitrary selection of predictors or subgroups, and have the potential to prevent cream-skimming practices where employment service providers target individuals with easier-to-achieve outcomes (Sansone and Zhu, 2021, p. 6).
The authors acknowledge limitations of the predictive approach: prediction is only a first step, and policymakers additionally require evidence on the effectiveness of specific interventions, which can only be obtained through causal methodologies such as randomised controlled trials rather than predictive modelling alone (IZA Newsroom, 2021; Sansone and Zhu, 2021, p. 7). Furthermore, the ML algorithms would need to be retrained using data from economic downturns to ensure continued accuracy during recessionary periods (Sansone and Zhu, 2021, p. 25). The authors also note persistent scepticism regarding accuracy concerns and bias reinforcement in algorithmic systems (IZA Newsroom, 2021). As of the most recent verification, this system remains a research prototype and has not been operationally deployed within Services Australia or any other Australian government agency.
Classification
AI Capabilities
Use Cases
Social Protection Functions
| SP Pillar (Primary) | Social assistance |
Programme Details
| Programme Name | Centrelink Income Support System (research prototype for ML-based recertification targeting) |
| Programme Type | Other |
| System Level | Implementation/delivery chain |
Australia's Centrelink income support system administered by the Department of Social Services (DSS) / Services Australia, covering six main categories of means-tested payments: student payments, unemployment payments, parenting payments, disability payment, carer payment, and age pension. The ML model was developed as a research prototype to predict long-term income support receipt intensity and inform targeted recertification scheduling.
Implementation Details
| Implementation Type | Classical ML |
| Lifecycle Stage | Model Selection and Training |
| Model Provenance | Developed in-house |
| Compute Environment | Not documented |
| Sovereignty Quadrant | Not assessed |
| Data Residency | Not documented |
| Cross-Border Transfer | Not documented |
Risk & Oversight
| Decision Criticality | Moderate |
| Human Oversight | HITL |
| Development Process | Fully in-house |
| Highest Risk Category | Model-related risks |
| Risk Assessment Status | Informal assessment |
Risk Dimensions
Data-related risks
Governance and institutional oversight risks
Model-related risks
Operational and system integration risks
Impact Dimensions
Autonomy, human dignity and due process
Equality, non-discrimination, fairness and inclusion
Privacy and data security
Safeguards
Deployment & Outcomes
| Deployment Status | Design & Development Phase |
| Year Initiated | 2018 |
| Scale / Coverage | 1% random sample of ~5 million working-age Centrelink registrants (50,615 individuals) from a total population of ~32 million persons in DOMINO; research dataset only, not operational coverage |
| Funding Source | Australian Research Council Linkage Project LP170100472 (AUD 320,000; 2 July 2018 to 31 December 2024) |
| Technical Partners | No commercial vendor identified. Models developed by academic researchers (Dario Sansone, University of Exeter; Anna Zhu, RMIT University) using off-the-shelf ML algorithms (LASSO, SVR, Boosting). No operational deployment platform documented. |
Outcomes / Results
ML ensemble achieves out-of-sample R-squared exceeding 76%, representing at least a 22% improvement (approximately 14-percentage-point increase in R-squared) compared to the best OLS heuristic model. Individuals identified by ML accrue approximately AUD 0.99 billion more in welfare costs than those identified under existing actuarial profiling (Try, Test and Learn programme), representing roughly 10% of total annual unemployment benefit expenditure. Novel predictors identified include annual income variability, residential relocation frequency, and failure to meet mutual obligation criteria. Approach is low-cost as it uses administrative data already available to caseworkers.
Challenges
Prediction is only a first step; evidence on intervention effectiveness requires causal methods such as RCTs. ML algorithms would need retraining with economic downturn data to maintain accuracy during recessions. Persistent scepticism regarding accuracy concerns and bias reinforcement in algorithmic systems. No operational deployment documented despite research completion.
Sources
- SRC-001-AUS-001 Sansone, D. and Zhu, A. (2021) 'Using Machine Learning to Create an Early Warning System for Welfare Recipients', IZA Discussion Paper No. 14377. Bonn: Institute of Labor Economics.
https://docs.iza.org/dp14377.pdf - SRC-005-AUS-001 Sansone, D. and Zhu, A. (2021) 'Machine Learning in the Welfare System', Austaxpolicy: The Tax and Transfer Policy Blog, 24 June. Available at: https://www.austaxpolicy.com/machine-learning-in-the-welfare-system/ (Accessed: 23 March 2026).
https://www.austaxpolicy.com/machine-learning-in-the-welfare-system/ - SRC-002-AUS-001 Sansone, D. and Zhu, A. (2023) 'Using Machine Learning to Create an Early Warning System for Welfare Recipients', Oxford Bulletin of Economics and Statistics, 85(5), pp. 959-992. doi:10.1111/obes.12550.
https://onlinelibrary.wiley.com/doi/10.1111/obes.12550 - SRC-003-AUS-001 Department of Social Services (2017) DOMINO (Data Over Multiple Individual Occurrences) - Dataset Standard Release, External Analytical Version. Canberra: Australian Government Department of Social Services.
https://dss.aristotlecloud.io/item/1942/dataset/domino-dataset-standard-release-external-version-f - SRC-004-AUS-001 IZA Institute of Labor Economics (2021) 'Machine Learning in the Welfare System', IZA Newsroom, 23 June. Available at: https://newsroom.iza.org/en/archive/research/machine-learning-in-the-welfare-system/ (Accessed: 23 March 2026).
https://newsroom.iza.org/en/archive/research/machine-learning-in-the-welfare-system/
How to Cite
DCI AI Hub (2026). 'Machine-Learning Early Warning System for Income Support Recipients (Australia — Research Prototype)', AI Hub AI Tracker, case AUS-001. Digital Convergence Initiative. Available at: https://socialprotectionai.org/use-case/AUS-001