NLP Text-Mining of Performance Evaluation Justifications (Mexico Performance Evaluation System)

Country Mexico

Deployment Status Pilot / Controlled Trial Phase

Confidence Likely

Implementing Agency UNDP Mexico Accelerator Lab; Performance Evaluation Unit of the Ministry of Finance and Public Credit (Secretaria de Hacienda y Credito Publico, SHCP)

Overview

The UNDP Accelerator Lab in Mexico, in collaboration with the Performance Evaluation Unit of the Ministry of Finance and Public Credit (Secretaria de Hacienda y Credito Publico, SHCP), developed a text mining system using natural language processing (NLP) and machine learning (ML) to analyse unstructured text data generated by Mexico's national Performance Evaluation System (Sistema de Evaluacion del Desempeno, SED). The system was initiated in 2021 and its open-source codebase was released on GitHub under an MIT licence by the UNDP Accelerator Lab Mexico (acclab-mx/textmining_pnud repository).

Mexico introduced its performance evaluation system in 2010, based on the Logical Framework methodology adapted from USAID, to track the impact of public programming and spending across all federal government departments. The system requires civil servants to report against a set of performance indicators unique to each public spending initiative. As part of this reporting, civil servants must write free-text justifications in their own words to explain why specific performance indicators were not met. Over a decade of operation, this produced a substantial corpus of unstructured text data — the case file references tens of thousands of indicator records spanning the period 2013 to 2019 — that had never been systematically analysed despite its potential to surface common barriers to policy implementation across programmes.

The NLP text-mining system was designed to address this gap. The algorithm clusters and ranks dominant themes from the free-text justification entries, compares text against a set of predefined common causes of underperformance, and identifies novel emerging themes that were not anticipated in the predefined categories. The system uses ML algorithms for clustering, topic detection, similarity scoring, and classification of the open-text justifications, all operating in Spanish. The technical implementation was built in Python 3.8 using Jupyter notebooks for exploratory analysis, with a conda-managed environment (textmining-env) for dependency management. The repository also includes a Docker-based application demonstrating language model capabilities for text analysis. The data is sourced from Mexico's budget transparency portal (transparenciapresupuestaria.gob.mx), specifically the 'Avance de indicadores' (Indicator Progress) dataset, which includes accompanying data dictionaries.

The project was implemented by a team consisting of Ministry of Finance evaluation unit staff, a high-ranking government official sponsor, an external ML/NLP consultant, two technology department personnel, and a project coordinator from the UNDP Accelerator Lab. The focal point for the project was Luis Fernando Cervantes of the UNDP Accelerator Lab Mexico. The project was budgeted at under USD 100,000 with a minimum timeline of six months. The code was released as open source under the MIT licence, with no commercial vendor involvement.

The system is positioned as an advisory decision-support tool, not an automatic decision engine. NLP-generated clusters and ranked themes are interpreted, validated, and refined by civil servants, evaluation officials, and technical staff in what the project describes as a hybrid 'collective intelligence' model combining algorithmic classification with human feedback. The eventual goal is to build a real-time hybrid collective intelligence system where NLP classification is combined with inputs from civil servants in real time to improve both the quality of evaluations and the reporting process itself.

The project enables large-scale analysis of previously unused free-text evaluation data, identifying common implementation issues across programmes and generating insights intended to improve reporting processes and public spending performance. However, the UNDP sources note that the text-mining process alone is insufficient — it needs to be implemented alongside complementary interventions such as changes to the user interface of the reporting software and training for civil servants to make the existing programme evaluation system more effective. No specific downstream policy or budget changes resulting from the system's outputs have been documented in the available sources. No independent governmental or third-party evaluation of the system's deployment scale or degree of institutionalisation within SHCP has been identified.

Classification

AI Capabilities

Clustering (similarity and grouping) (primary)ClassificationPerception and extraction from unstructured inputs

Use Cases

Policy analysis, learning and M&E (primary)

Social Protection Functions

Policy: Coordination and governance + Technical and functional capacities (primary)

SP Pillar (Primary)

Social assistance

Programme Details

Programme Name	Mexico Performance Evaluation System (Sistema de Evaluacion del Desempeno, SED) — NLP Text-Mining Component
Programme Type	Other
System Level	Policy

Mexico's national Performance Evaluation System introduced in 2010 to track public spending performance across all federal departments using Logical Framework methodology; the NLP text-mining component analyses free-text justifications submitted by civil servants for unmet performance indicators.

Implementation Details

Implementation Type	Classical ML
Lifecycle Stage	Integration and Deployment
Model Provenance	Adapted from open-source
Compute Environment	Not documented
Sovereignty Quadrant	Not assessed
Data Residency	Not documented
Cross-Border Transfer	Not documented

Risk & Oversight

Decision Criticality	Low
Human Oversight	HITL
Development Process	Mix of in-house and third-party
Highest Risk Category	Model-related risks
Risk Assessment Status	Not assessed

Risk Dimensions

Data-related risks

Data quality failureRepresentation bias

Governance and institutional oversight risks

Weak documentation or auditability

Model-related risks

Opacity or limited explainabilityReliability or generalisation failureSubgroup bias

Operational and system integration risks

Inadequate real-world validation

Impact Dimensions

Autonomy, human dignity and due process

Opaque or unexplained decision

Equality, non-discrimination, fairness and inclusion

Reinforcement of structural inequity

Safeguards

Human oversight protocol

Deployment & Outcomes

Deployment Status	Pilot / Controlled Trial Phase
Year Initiated	2021
Scale / Coverage	National-level dataset (tens of thousands of indicator records from 2013-2019 across all federal departments); deployment scale and institutionalisation unverified
Funding Source	UNDP Accelerator Lab; budget under USD 100,000
Technical Partners	Custom ML/NLP solution developed by UNDP Accelerator Lab Mexico with external ML/NLP consultant; open-source code released on GitHub (MIT License); no commercial vendor

Outcomes / Results

Enables large-scale analysis of previously unused free-text evaluation data; identifies common implementation issues across programmes; generates insights to improve reporting processes and public-spending performance; specific downstream policy or budget changes not yet documented; sources note NLP alone is insufficient without complementary UI and training interventions

Challenges

Complementary interventions required (UI changes to reporting software, civil servant training) for full effectiveness; no independent evaluation of deployment scale or institutionalisation; all text data is in Spanish requiring language-specific NLP models; v1 verification status was 'Partially Verified'

Sources

SRC-003-MEX-001 UNDP Accelerator Lab Mexico (2021) 'textmining_pnud', GitHub repository. Available at: https://github.com/acclab-mx/textmining_pnud (Accessed: 23 March 2026).
https://github.com/acclab-mx/textmining_pnud
SRC-002-MEX-001 UNDP Accelerator Labs (2021) 'Text mining with natural language processing (NLP) to identify barriers to policy implementation in Mexico', in Collective Intelligence for Sustainable Development: 13 Stories from the UNDP Accelerator Labs. Available at: https://smartertogether.earth/13-stories-from-the-labs/text-mining-natural-language-processing-nlp-identify-barriers-policy (Accessed: 23 March 2026).
https://smartertogether.earth/13-stories-from-the-labs/text-mining-natural-language-processing-nlp-identify-barriers-policy
SRC-001-MEX-001 UNDP Innovation Toolkit (2023) 'Text Mining Performance Evaluation', UNDP Accelerator Labs Innovation Toolkit for Signature Solutions, Governance Chapter 6.8. Available at: https://undp-accelerator-labs.github.io/Innovation-Toolkit-for-UNDP-Signature-Solutions/6.Governance/6.8%20TextMiningPerformance.html (Accessed: 23 March 2026).
https://undp-accelerator-labs.github.io/Innovation-Toolkit-for-UNDP-Signature-Solutions/6.Governance/6.8%20TextMiningPerformance.html

How to Cite

DCI AI Hub (2026). 'NLP Text-Mining of Performance Evaluation Justifications (Mexico Performance Evaluation System)', AI Hub AI Tracker, case MEX-001. Digital Convergence Initiative. Available at: https://socialprotectionai.org/use-case/MEX-001

Back to case page