ESP-001

INSS 'AI Doctor' — ML-Based Sick Leave Fraud Detection (Modelo de Priorización de Citas)

Download PDF
Spain Europe & Central Asia High income Full Production Deployment Confirmed

Instituto Nacional de la Seguridad Social (INSS), under the Ministry of Inclusion, Social Security and Migration

At a Glance

What it does Prediction (including forecasting) — Compliance and integrity
Who runs it Instituto Nacional de la Seguridad Social (INSS), under the Ministry of Inclusion, Social Security and Migration
Programme Incapacidad Temporal (Temporary Disability / Sick Leave Benefits)
Confidence Confirmed
Deployment Status Full Production Deployment
Key Risks Governance and institutional oversight risks
Key Outcomes Few of the originally promised gains have materialised after 5+ years.
Source Quality 8 sources — Academic journal article, News article / media

The Instituto Nacional de la Seguridad Social (INSS) — Spain's national social security institute — deploys two XGBoost (gradient-boosted decision tree) machine learning algorithms to assess sick leave (incapacidad temporal) cases on a daily basis. The system generates a numerical score between 0 and 1 for each worker currently on sick leave, estimating the likelihood that the worker is ready to return to work. These scores are used to prioritise which cases medical inspectors should review first, effectively creating a 'digital waiting list' that controls the order of medical inspection appointments.

The scoring system operates on a four-tier scale: 0.00–0.30 indicates slow recovery expected (maintain leave), 0.31–0.60 indicates favourable progress (standard review scheduling), 0.61–0.80 indicates notable improvement (priority appointment), and 0.81–1.00 indicates imminent clearance (possible end of leave). The model draws on a range of input variables including gender, age, place of residence (which carries three times the statistical weight of specific medical diagnosis), medical diagnoses, duration of current leave, patient medical history, prior leave history, case type, medical reports from public health services, reports from mutual insurance companies (mutuas), and inspector assessments recorded in the Atrium internal application.

The system was built by SAS (a US-based analytics software company) and implemented by ViewNext (a Spanish subsidiary of IBM), at a cost of at least EUR 1 million based on procurement tender documents. It was deployed in 2018 and integrated into inspector workflows from 2018. The current model version has been operational unchanged since November 2020.

Critically, the system operated in secret for approximately five years (2018–2023), with no public disclosure of its existence or functioning. It was exposed in April 2023 through an investigation by Lighthouse Reports and El Confidencial, part of the cross-border 'Suspicion Machines' investigative series that also examined algorithmic welfare systems in the Netherlands (SyRI), Serbia, and other countries. Following exposure, the Spanish Ministry of Inclusion denied transparency requests from journalists, citing that disclosure would 'compromise essential public interests' and affect system 'efficacy'.

Internal performance evaluations have revealed significant quality concerns. The system has a documented internal validation error rate of 15.4%, meaning it fails in approximately one out of every six cases. In the first half of 2025, only 35.48% of algorithmically-selected workers received medical discharge, compared to 41.48% for cases selected manually by inspectors — meaning human judgment consistently outperforms the algorithm. Senior INSS officials have conceded the algorithms are 'not accurate', and expert Ana Valdivia of the Oxford Internet Institute described the false positive performance as 'poor' and 'unbalanced'. Medical inspectors working with the system daily have stated they 'are not able to explain what it is'. The system has been described as 'rendered effectively useless' due to chronic underfunding and inspector staff shortages across the INSS inspection corps.

The incapacidad temporal programme processed by this system represents a major fiscal commitment: 2024 national spending was EUR 16.5 billion (approximately 1.8% of GDP), with spending having increased 60% since 2017. An average of 1.6 million workers are on sick leave on any given day across Spain. The system would be classified as high-risk under the EU AI Act (Annex III — systems determining access to public benefits), requiring conformity assessments, transparency obligations, and human oversight provisions. Compliance with these forthcoming requirements has not been verified.

Classifications follow the DCI AI Hub Taxonomy. Hover over field labels for definitions.

Social Protection Functions

Implementation/delivery chain
Accountability mechanisms primaryCase management
SP Pillar (Primary) The social protection branch: social assistance, social insurance, or labour market programmes. Social insurance
Programme Name Incapacidad Temporal (Temporary Disability / Sick Leave Benefits)
Programme Type The type of social protection programme, classified under social assistance, social insurance, or labour market programmes. View in glossary Health Insurance
System Level Where in the social protection system the AI is applied: policy level, programme design, or implementation/delivery chain. View in glossary Implementation/delivery chain
Programme Description Workers receive 60% of salary for days 4-20 of sick leave, 75% from day 21 onward. Maximum duration 365 days, extendable by 180 days. 2024 national spending was EUR 16.5 billion (~1.8% of GDP). Average 1.6 million workers on sick leave on any given day.
Implementation Type How the AI output is produced: Classical ML, Deep learning, Foundation model, or Hybrid. Affects validation, compute requirements, and governance profile. View in glossary Classical ML
Lifecycle Stage Current stage in the AI lifecycle, from problem identification through to monitoring, maintenance and decommissioning. View in glossary Monitoring, Maintenance and Decommissioning
Model Provenance Origin of the AI model: developed in-house, adapted from open-source, commercial/proprietary, or accessed via third-party API. View in glossary Commercial/proprietary
Compute Environment Where the AI system runs: on-premise, government cloud, commercial cloud, or edge/device. View in glossary Not documented
Sovereignty Quadrant Classification of data and compute sovereignty: I (Sovereign), II (Federated/Hybrid), III (Cloud with safeguards), or IV (Shared Innovation Zone). View in glossary Not assessed
Data Residency Where the data used by the AI system is stored: domestic, regional, or international. View in glossary Domestic
Cross-Border Transfer Whether data crosses national borders, and if so, whether documented safeguards are in place. View in glossary Not documented
Decision Criticality The rights impact of the decision the AI supports. High criticality requires HITL oversight; moderate requires HOTL; low may operate HOOTL. View in glossary High
Human Oversight Type Level of human involvement: Human-in-the-Loop (active review), Human-on-the-Loop (monitoring), or Human-out-of-the-Loop (periodic audit). View in glossary HOTL
Development Process Whether the AI system was developed fully in-house, through a mix of in-house and third-party, or fully by an external provider. View in glossary Fully third-party developed
Highest Risk Category The most significant structural risk source identified: data, model, operational, governance, or market/sovereignty risks. View in glossary Governance and institutional oversight risks
Risk Assessment Status Whether a formal risk assessment, informal assessment, or independent audit has been conducted for this system. Not assessed
Documented Risk Events Internal validation error rate of 15.4% (fails in ~1 of 6 cases). Algorithm-selected cases have LOWER discharge rate (35.48%) than manually-selected cases (41.48%), meaning human judgment outperforms the algorithm. Place of residence weighted THREE TIMES HIGHER than specific medical diagnosis. System operated in secret for 5 years. INSS refused transparency requests citing 'essential public interests'. Medical inspectors report they 'cannot explain what it is'. Senior INSS officials conceded the algorithms are 'not accurate'. Expert Ana Valdivia (Oxford Internet Institute) described false positive performance as 'poor' and 'unbalanced'.
  • Human oversight protocol
CategorySensitivityCross-System LinkageAvailabilityKey Constraints
Beneficiary registries and MISSpecial categoryLinks data across multiple systemsCurrently available and usedUses gender, age, place of residence, medical diagnoses, leave duration, prior leave history, medical reports from public health and mutual insurance companies (mutuas), and inspector assessments from the Atrium application. Health data is GDPR special category data.

Nieto Garrote, A. (2025) 'Sistemas de IA en las Entidades Gestoras de la Seguridad Social', Revista de Derecho de la Seguridad Social, Laborum.

View source Academic journal article

Andalucía Informa / elDiario.es (2026) 'Tu baja laboral la decide ahora un algoritmo con inteligencia artificial: el sistema del INSS está en el punto de mira'.

View source News article / media

Andalucía Informa / elDiario.es (2026) 'Así es la lista de espera digital del INSS: el algoritmo secreto que decide tu alta tras una baja laboral'.

View source News article / media

Fidelitis (2025) 'El algoritmo del INSS que decide tu baja médica'.

View source News article / media

González, J.A. (2026) 'La Seguridad Social tira de la IA para cazar bajas laborales dudosas: un 35% de éxito', Diario de León, 23 February 2026.

View source News article / media

Lighthouse Reports (2023) 'Spain's AI Doctor', Suspicion Machines series, April 2023.

View source News article / media

Observatorio de Bioética y Derecho, Universitat de Barcelona (2023) 'La Seguridad Social usa una IA secreta para rastrear bajas laborales y cazar fraudes'.

View source Academic journal article

The Olive Press (2023) 'How Spain's Social Security system is using artificial intelligence to identify fraudulent sick leave', 17 April 2023.

View source News article / media
Deployment Status How far the system has progressed into real-world operational use, from concept/exploration through to scaled and institutionalised. View in glossary Full Production Deployment
Year Initiated The year the AI system was first initiated or development began. 2018
Scale / Coverage The scale and geographic or population coverage of the deployment. ~1.6 million workers on sick leave nationally; processes cases daily across all INSS inspection offices
Funding Source The source(s) of funding for the AI system development and deployment. Spanish Social Security budget (at least EUR 1 million in SAS tender)
Technical Partners External technology vendors, academic partners, or development partners involved. SAS (fraud detection software platform); ViewNext (IBM subsidiary, implementation)
Outcomes / Results Few of the originally promised gains have materialised after 5+ years. System described as 'rendered effectively useless' due to chronic underfunding and inspector staff shortages. Algorithm-selected cases perform worse than manually-selected cases (35.48% vs 41.48% discharge rate).
Challenges Chronic INSS inspector staff shortages and budget cuts undermine the system's utility regardless of algorithm quality. 60% increase in sick leave spending since 2017 creates political pressure to use algorithmic tools even when they underperform. EU AI Act will classify this as high-risk (Annex III) requiring conformity assessment, transparency, and human oversight — compliance is unverified.

How to Cite

DCI AI Hub (2026). 'INSS 'AI Doctor' — ML-Based Sick Leave Fraud Detection (Modelo de Priorización de Citas)', AI Hub AI Tracker, case ESP-001. Digital Convergence Initiative. Available at: https://socialprotectionai.org/use-case/ESP-001 [Accessed: 1 April 2026].

Change History

Created 30 Mar 2026, 08:38
by v2-import (import)