MEX-001

NLP Text-Mining of Performance Evaluation Justifications (Mexico Performance Evaluation System)

Download PDF
Mexico Latin America & Caribbean Upper middle income Pilot / Controlled Trial Phase Likely

UNDP Mexico Accelerator Lab; Performance Evaluation Unit of the Ministry of Finance and Public Credit (Secretaria de Hacienda y Credito Publico, SHCP)

At a Glance

What it does Clustering (similarity and grouping) — Policy analysis, learning and M&E
Who runs it UNDP Mexico Accelerator Lab; Performance Evaluation Unit of the Ministry of Finance and Public Credit (Secretaria de Hacienda y Credito Publico, SHCP)
Programme Mexico Performance Evaluation System (Sistema de Evaluacion del Desempeno, SED) — NLP Text-Mining Component
Confidence Likely
Deployment Status Pilot / Controlled Trial Phase
Key Risks Model-related risks
Key Outcomes Enables large-scale analysis of previously unused free-text evaluation data; identifies common implementation issues across programmes; generates insights to improve reporting processes and public-spending performance; specific downstream policy or budget changes not yet documented; sources note NLP alone is insufficient without complementary UI and training interventions.
Source Quality 3 sources — Dataset / database, Report (multilateral / development partner)

The UNDP Accelerator Lab in Mexico, in collaboration with the Performance Evaluation Unit of the Ministry of Finance and Public Credit (Secretaria de Hacienda y Credito Publico, SHCP), developed a text mining system using natural language processing (NLP) and machine learning (ML) to analyse unstructured text data generated by Mexico's national Performance Evaluation System (Sistema de Evaluacion del Desempeno, SED). The system was initiated in 2021 and its open-source codebase was released on GitHub under an MIT licence by the UNDP Accelerator Lab Mexico (acclab-mx/textmining_pnud repository).

Mexico introduced its performance evaluation system in 2010, based on the Logical Framework methodology adapted from USAID, to track the impact of public programming and spending across all federal government departments. The system requires civil servants to report against a set of performance indicators unique to each public spending initiative. As part of this reporting, civil servants must write free-text justifications in their own words to explain why specific performance indicators were not met. Over a decade of operation, this produced a substantial corpus of unstructured text data — the case file references tens of thousands of indicator records spanning the period 2013 to 2019 — that had never been systematically analysed despite its potential to surface common barriers to policy implementation across programmes.

The NLP text-mining system was designed to address this gap. The algorithm clusters and ranks dominant themes from the free-text justification entries, compares text against a set of predefined common causes of underperformance, and identifies novel emerging themes that were not anticipated in the predefined categories. The system uses ML algorithms for clustering, topic detection, similarity scoring, and classification of the open-text justifications, all operating in Spanish. The technical implementation was built in Python 3.8 using Jupyter notebooks for exploratory analysis, with a conda-managed environment (textmining-env) for dependency management. The repository also includes a Docker-based application demonstrating language model capabilities for text analysis. The data is sourced from Mexico's budget transparency portal (transparenciapresupuestaria.gob.mx), specifically the 'Avance de indicadores' (Indicator Progress) dataset, which includes accompanying data dictionaries.

The project was implemented by a team consisting of Ministry of Finance evaluation unit staff, a high-ranking government official sponsor, an external ML/NLP consultant, two technology department personnel, and a project coordinator from the UNDP Accelerator Lab. The focal point for the project was Luis Fernando Cervantes of the UNDP Accelerator Lab Mexico. The project was budgeted at under USD 100,000 with a minimum timeline of six months. The code was released as open source under the MIT licence, with no commercial vendor involvement.

The system is positioned as an advisory decision-support tool, not an automatic decision engine. NLP-generated clusters and ranked themes are interpreted, validated, and refined by civil servants, evaluation officials, and technical staff in what the project describes as a hybrid 'collective intelligence' model combining algorithmic classification with human feedback. The eventual goal is to build a real-time hybrid collective intelligence system where NLP classification is combined with inputs from civil servants in real time to improve both the quality of evaluations and the reporting process itself.

The project enables large-scale analysis of previously unused free-text evaluation data, identifying common implementation issues across programmes and generating insights intended to improve reporting processes and public spending performance. However, the UNDP sources note that the text-mining process alone is insufficient — it needs to be implemented alongside complementary interventions such as changes to the user interface of the reporting software and training for civil servants to make the existing programme evaluation system more effective. No specific downstream policy or budget changes resulting from the system's outputs have been documented in the available sources. No independent governmental or third-party evaluation of the system's deployment scale or degree of institutionalisation within SHCP has been identified.

Classifications follow the DCI AI Hub Taxonomy. Hover over field labels for definitions.

Social Protection Functions

Policy
Coordination and governance + Technical and functional capacities primary
SP Pillar (Primary) The social protection branch: social assistance, social insurance, or labour market programmes. Social assistance
Programme Name Mexico Performance Evaluation System (Sistema de Evaluacion del Desempeno, SED) — NLP Text-Mining Component
Programme Type The type of social protection programme, classified under social assistance, social insurance, or labour market programmes. View in glossary Other
System Level Where in the social protection system the AI is applied: policy level, programme design, or implementation/delivery chain. View in glossary Policy
Programme Description Mexico's national Performance Evaluation System introduced in 2010 to track public spending performance across all federal departments using Logical Framework methodology; the NLP text-mining component analyses free-text justifications submitted by civil servants for unmet performance indicators.
Implementation Type How the AI output is produced: Classical ML, Deep learning, Foundation model, or Hybrid. Affects validation, compute requirements, and governance profile. View in glossary Classical ML
Lifecycle Stage Current stage in the AI lifecycle, from problem identification through to monitoring, maintenance and decommissioning. View in glossary Integration and Deployment
Model Provenance Origin of the AI model: developed in-house, adapted from open-source, commercial/proprietary, or accessed via third-party API. View in glossary Adapted from open-source
Compute Environment Where the AI system runs: on-premise, government cloud, commercial cloud, or edge/device. View in glossary Not documented
Sovereignty Quadrant Classification of data and compute sovereignty: I (Sovereign), II (Federated/Hybrid), III (Cloud with safeguards), or IV (Shared Innovation Zone). View in glossary Not assessed
Data Residency Where the data used by the AI system is stored: domestic, regional, or international. View in glossary Not documented
Cross-Border Transfer Whether data crosses national borders, and if so, whether documented safeguards are in place. View in glossary Not documented
Decision Criticality The rights impact of the decision the AI supports. High criticality requires HITL oversight; moderate requires HOTL; low may operate HOOTL. View in glossary Low
Human Oversight Type Level of human involvement: Human-in-the-Loop (active review), Human-on-the-Loop (monitoring), or Human-out-of-the-Loop (periodic audit). View in glossary HITL
Development Process Whether the AI system was developed fully in-house, through a mix of in-house and third-party, or fully by an external provider. View in glossary Mix of in-house and third-party
Highest Risk Category The most significant structural risk source identified: data, model, operational, governance, or market/sovereignty risks. View in glossary Model-related risks
Risk Assessment Status Whether a formal risk assessment, informal assessment, or independent audit has been conducted for this system. Not assessed

Risk Dimensions

Governance and institutional oversight risks
Operational and system integration risks

Impact Dimensions

Autonomy, human dignity and due process
Equality, non-discrimination, fairness and inclusion
  • Human oversight protocol
CategorySensitivityCross-System LinkageAvailabilityKey Constraints
Administrative data from other sectorsNon-personalSingle source (no linkage)Currently available and usedStructured performance indicator metadata and open government data from the national Performance Evaluation System
Unstructured and text-based contentNon-personalSingle source (no linkage)Currently available and usedFree-text justifications written by civil servants in Spanish explaining unmet performance indicators; tens of thousands of indicator records from 2013-2019; sourced from transparenciapresupuestaria.gob.mx

UNDP Accelerator Lab Mexico (2021) 'textmining_pnud', GitHub repository. Available at: https://github.com/acclab-mx/textmining_pnud (Accessed: 23 March 2026).

View source Dataset / database

UNDP Accelerator Labs (2021) 'Text mining with natural language processing (NLP) to identify barriers to policy implementation in Mexico', in Collective Intelligence for Sustainable Development: 13 Stories from the UNDP Accelerator Labs. Available at: https://smartertogether.earth/13-stories-from-the-labs/text-mining-natural-language-processing-nlp-identify-barriers-policy (Accessed: 23 March 2026).

View source Report (multilateral / development partner)

UNDP Innovation Toolkit (2023) 'Text Mining Performance Evaluation', UNDP Accelerator Labs Innovation Toolkit for Signature Solutions, Governance Chapter 6.8. Available at: https://undp-accelerator-labs.github.io/Innovation-Toolkit-for-UNDP-Signature-Solutions/6.Governance/6.8%20TextMiningPerformance.html (Accessed: 23 March 2026).

View source Report (multilateral / development partner)
Deployment Status How far the system has progressed into real-world operational use, from concept/exploration through to scaled and institutionalised. View in glossary Pilot / Controlled Trial Phase
Year Initiated The year the AI system was first initiated or development began. 2021
Scale / Coverage The scale and geographic or population coverage of the deployment. National-level dataset (tens of thousands of indicator records from 2013-2019 across all federal departments); deployment scale and institutionalisation unverified
Funding Source The source(s) of funding for the AI system development and deployment. UNDP Accelerator Lab; budget under USD 100,000
Technical Partners External technology vendors, academic partners, or development partners involved. Custom ML/NLP solution developed by UNDP Accelerator Lab Mexico with external ML/NLP consultant; open-source code released on GitHub (MIT License); no commercial vendor
Outcomes / Results Enables large-scale analysis of previously unused free-text evaluation data; identifies common implementation issues across programmes; generates insights to improve reporting processes and public-spending performance; specific downstream policy or budget changes not yet documented; sources note NLP alone is insufficient without complementary UI and training interventions
Challenges Complementary interventions required (UI changes to reporting software, civil servant training) for full effectiveness; no independent evaluation of deployment scale or institutionalisation; all text data is in Spanish requiring language-specific NLP models; v1 verification status was 'Partially Verified'

How to Cite

DCI AI Hub (2026). 'NLP Text-Mining of Performance Evaluation Justifications (Mexico Performance Evaluation System)', AI Hub AI Tracker, case MEX-001. Digital Convergence Initiative. Available at: https://socialprotectionai.org/use-case/MEX-001 [Accessed: 1 April 2026].

Change History

Created 30 Mar 2026, 08:40
by v2-import (import)