The Municipality of Rotterdam deployed a machine learning algorithm in 2017 to predict welfare fraud among its approximately 30,000 social assistance recipients. The system was developed by the international consulting firm Accenture, which promoted the technology in a white paper as producing 'unbiased citizen outcomes' and an 'ethical solution' to fraud detection. The algorithm was a Gradient Boosting Machine consisting of 500 stacked decision trees, each with up to nine decision points, trained on a dataset of approximately 12,707 past fraud investigations conducted between the system's introduction and its suspension. The model processed 315 input variables to generate a risk score between zero and one for each welfare recipient, and on average the top 1,000 to 1,500 highest-scoring individuals were selected for fraud investigation each year.
The 315 input variables encompassed a wide range of personal and behavioural characteristics. Demographic variables included age, gender, marital status, and parenthood status. Linguistic variables comprised approximately 20 measures of Dutch language proficiency, spoken language, and compliance with language requirements. Financial variables covered days with financial problems and income stability. Residential variables included neighbourhood, housing type, roommate status, and tenure duration. Critically, the system also incorporated subjective caseworker assessments including observations about a recipient's physical appearance, their perceived ability to 'convince and influence others', how outgoing they were, and the length of their most recent romantic relationship. The mixture of objective demographic data with subjective behavioural assessments created a system in which caseworker biases were encoded directly into the algorithmic scoring process.
The algorithm's training data suffered from a significant structural flaw: the dataset contained approximately 50 percent fraud cases, whereas the actual fraud rate in the welfare population was approximately 21 percent. This overrepresentation of positive fraud cases in the training data meant the model was calibrated against a distorted picture of reality, learning patterns associated with being investigated rather than patterns genuinely predictive of fraud. The investigations that generated the training data were themselves shaped by existing enforcement biases and caseworker discretion, meaning the algorithm learned to replicate and amplify pre-existing patterns of selective scrutiny rather than identifying fraud on a neutral basis.
A landmark investigation published in March 2023 by Lighthouse Reports and WIRED, titled 'Suspicion Machines', obtained unprecedented access to Rotterdam's algorithm source code, machine learning model file, and training data — the first time a European government had provided complete transparency into a welfare fraud detection algorithm. Rotterdam was the only city among dozens contacted across Europe willing to share the code behind its system. The investigation subjected the algorithm to rigorous fairness testing using two standard measures: statistical parity (whether demographic groups reached the high-risk threshold proportionally) and controlled statistical parity (isolating the effect of specific variables by creating dataset copies with modified characteristics).
The fairness analysis revealed systematic discrimination across multiple protected characteristics. On gender, women were 1.25 times more likely than men to be flagged for investigation, an effect that intensified when combined with parenthood. Single mothers were classified as especially high-risk by the algorithm. On language and ethnicity, recipients who were not fluent in Dutch were almost twice as likely to be flagged as fluent speakers with otherwise identical profiles. The Netherlands Institute of Human Rights concluded that this constituted indirect discrimination on the basis of origin, since language proficiency correlates strongly with ethnic background and migration status. On age, the age variable was the single most important factor in the model — nearly three times more impactful than the second-ranked variable — with strong bias against younger recipients whose scores decreased systematically as they aged. On parenthood, parents were 1.7 times more likely to exceed the high-risk investigation threshold than non-parents, with single mothers facing compounded penalties from the intersection of gender, parenthood, and financial vulnerability.
Despite the extensive data collection and processing, independent analysis found that the algorithm's predictive performance was poor. ROC curve analysis demonstrated the system was only 50 percent more accurate than random selection of welfare recipients for investigation. An AI ethics expert who reviewed the system characterised its performance as 'essentially random guessing', suggesting that the elaborate surveillance infrastructure imposed substantial privacy costs on welfare recipients while delivering minimal improvement in fraud detection accuracy over simply selecting cases at random.
The algorithm was suspended in mid-2021 following a critical review by the Rekenkamer Rotterdam (Rotterdam Court of Audit), which found insufficient coordination between the algorithm's developers and the staff using it, potentially resulting in poor ethical decision-making. The audit also found that it was not possible for citizens to determine whether they had been flagged by the algorithm, and that some of the data used risked producing biased outputs. Rotterdam attempted to address the identified bias but ultimately concluded it was unable to eliminate the discrimination from the system. The city had taken over development from Accenture in 2018, but the fundamental architectural choices — including the selection of discriminatory input variables and the biased training data — persisted through subsequent iterations.
A data security incident also occurred during the investigation: Rotterdam accidentally revealed pseudonymised training data embedded in the HTML source code of histogram visualisations shared with Lighthouse Reports, which the city confirmed 'should not have happened'. The incident underscored the governance weaknesses surrounding the system's deployment and the challenges of maintaining data security in complex algorithmic systems.
The Rotterdam case became a centrepiece of broader European scrutiny of welfare surveillance algorithms. A European Parliament question was tabled in 2023 asking whether Rotterdam's fraud prediction algorithms constituted a violation of fundamental rights and the rule of law by the Dutch government. The case drew comparisons with the Dutch childcare benefits scandal (toeslagenaffaire), in which algorithmic profiling by the Dutch Tax Authority led to the wrongful accusation of thousands of predominantly minority families and contributed to the fall of the Rutte III government in January 2021. The Racism and Technology Center characterised Rotterdam's algorithm as 'racist technology in action', noting the systemic pattern of Dutch government agencies deploying discriminatory automated systems against vulnerable populations.