Novissi -- Machine Learning Model Performance Monitoring and Bias Detection

Country Togo

Deployment Status Pilot / Controlled Trial Phase

Confidence Confirmed

Implementing Agency Government of Togo -- Ministry of Digital Economy and Digital Transformation (MENTD)

Overview

The Government of Togo, in collaboration with academic research partners from the University of California Berkeley Center for Effective Global Action (CEGA), the Data-Intensive Development Lab, J-PAL, and Northwestern University, conducted a systematic evaluation of the machine learning model used to target emergency cash transfers under the Novissi social protection programme. This evaluation component -- distinct from the targeting model itself (documented separately as TGO-001) -- focused on monitoring the performance, fairness, and error behaviour of the phone-based poverty prediction algorithms that were used to prioritise the poorest individuals for COVID-19 relief payments between November 2020 and March 2021.

The Novissi programme was launched in April 2020 by Togo's Ministry of Digital Economy and Digital Transformation (MENTD) as an emergency cash transfer programme for informal sector workers affected by the COVID-19 pandemic. The first phase distributed payments to 572,852 informal workers in the greater Lome area (World Bank, 2021). In a second phase, 57,000 new beneficiaries in the 100 poorest rural cantons were identified using machine learning algorithms trained on anonymised mobile phone metadata -- specifically call detail records (CDR) -- to predict individual consumption levels for 5.7 million subscribers, representing approximately 70 percent of Togo's population (World Bank, 2021; Aiken et al., 2022).

The performance monitoring and evaluation effort assessed how accurately the machine learning targeting model identified the poorest individuals compared to alternative targeting approaches. The evaluation used ground-truth consumption and wealth data collected through two independent datasets: a nationally representative field survey conducted in 2018-2019 (n = 6,171 households) and a large phone survey of active mobile subscribers conducted in 2020 (n = 8,915 individuals). These ground-truth datasets provided the benchmark against which model predictions were compared (Aiken et al., 2022).

The evaluation framework applied multiple performance metrics including classification accuracy (proportion of correctly identified poor and non-poor), precision (proportion of targeted individuals who are truly poor), exclusion error (proportion of truly poor individuals excluded from benefits), and inclusion error (proportion of non-poor individuals incorrectly receiving benefits). These metrics were calculated across different targeting thresholds and compared against several counterfactual targeting methods: geographic targeting at the prefecture and canton level, occupation-based targeting, and proxy means testing (PMT) -- the latter representing a hypothetical benchmark since no comprehensive social registry existed in Togo at the time (Aiken et al., 2022).

The evaluation found that the phone-based machine learning targeting approach reduced exclusion errors by 4 to 21 percent relative to the geographic targeting alternatives that were considered by the Government of Togo. However, relative to methods that would require a comprehensive social registry -- specifically proxy means testing based on detailed household survey data -- the machine learning approach increased exclusion errors by 9 to 35 percent. This finding was significant for informing future programme design, as it demonstrated that while phone-based ML targeting outperformed the feasible alternatives available during the crisis, it would not outperform traditional PMT if registry data were available (Aiken et al., 2022).

A critical component of the evaluation addressed algorithmic fairness and potential bias. The research team planned in-person surveys for 2021 specifically designed to detect inadvertent predictive bias against vulnerable subgroups, including women, illiterate individuals, and other marginalised populations. The fairness analysis disaggregated performance metrics by demographic characteristics including gender, literacy status, and other marginalisation indicators to identify whether the model systematically underperformed for specific groups (World Bank, 2021; Aiken et al., 2022).

The machine learning models evaluated were classical ML algorithms -- specifically gradient boosted decision trees and regularised logistic regression models -- trained on 857 to 1,042 CDR features describing aspects of each subscriber's mobile phone behaviour. These features included call and SMS activity patterns, mobility indicators derived from cell tower locations, social network characteristics, mobile data consumption, and expenditure on airtime and mobile money (Aiken et al., 2022). The evaluation results were published in the peer-reviewed journal Nature in 2022, enabling independent external scrutiny of the methodology and findings. The research was conducted under academic institutional review board (IRB) protocols, though no specific regulatory or legal framework governing the monitoring component was documented in the primary sources.

Classification

AI Capabilities

Classification (primary)Prediction (including forecasting)

Use Cases

Policy analysis, learning and M&E (primary)Data quality and anomaly detection

Social Protection Functions

Implementation/delivery chain: Monitoring and evaluation (primary)Programme design: Eligibility criteria and qualifying conditions

SP Pillar (Primary)

Social assistance

Programme Details

Programme Name	Novissi Emergency Cash Transfer Programme -- ML Performance Monitoring Component
Programme Type	Emergency Cash Transfers
System Level	Implementation/delivery chain

Systematic evaluation of the machine learning poverty prediction model used to target beneficiaries under the Novissi emergency cash transfer programme, assessing model accuracy, exclusion/inclusion errors, and algorithmic fairness across demographic subgroups.

Implementation Details

Implementation Type	Classical ML
Lifecycle Stage	Monitoring, Maintenance and Decommissioning
Model Provenance	Adapted from open-source
Compute Environment	Not documented
Sovereignty Quadrant	IV — Shared Innovation Zone
Data Residency	International
Cross-Border Transfer	Without documented safeguards

Risk & Oversight

Decision Criticality	Moderate
Human Oversight	HOTL
Development Process	Mix of in-house and third-party
Highest Risk Category	Model-related risks
Risk Assessment Status	Independent audit completed

Risk Dimensions

Data-related risks

Consent or lawful basis gapData or concept driftRepresentation biasWeak provenance or lineage

Market, sovereignty and industry structure risks

Jurisdictional hosting riskLMIC power asymmetry

Model-related risks

Subgroup bias

Impact Dimensions

Equality, non-discrimination, fairness and inclusion

Discriminatory outcomeDisparate error rates across groupsSystematic exclusion from benefits or services

Safeguards

Bias auditIndependent evaluation

Deployment & Outcomes

Deployment Status	Pilot / Controlled Trial Phase
Year Initiated	2020
Scale / Coverage	National evaluation scope -- model predictions covered 5.7 million mobile subscribers (70% of population); ground-truth validation surveys covered approximately 15,000 individuals
Funding Source	World Bank IDA financing under the West Africa Unique Identification for Regional Integration and Inclusion (WURI) Programme
Technical Partners	UC Berkeley Center for Effective Global Action (CEGA); Data-Intensive Development Lab; J-PAL; Northwestern University; Innovations for Poverty Action (IPA)

Outcomes / Results

Phone-based ML targeting reduced exclusion errors by 4-21% relative to geographic targeting alternatives; traditional PMT would outperform ML if registry data were available (9-35% higher exclusion errors for ML vs PMT); findings published in Nature (2022) contributing to global evidence on AI targeting performance in social protection.

Challenges

No comprehensive social registry existed in Togo, limiting the feasibility of traditional PMT approaches; phone survey respondents skewed younger, poorer, and more male than the general population; fairness evaluation for vulnerable subgroups (women, illiterate, marginalised) required additional in-person surveys planned for 2021.

Sources

SRC-001-TGO-004 Aiken, E., Bellue, S., Karlan, D., Udry, C. and Blumenstock, J. (2022). Machine learning and phone data can improve targeting of humanitarian aid. Nature, 603, pp.864-870. DOI: 10.1038/s41586-022-04484-9. Available at: https://www.nature.com/articles/s41586-022-04484-9 (Accessed 24 Mar 2026).
https://www.nature.com/articles/s41586-022-04484-9
SRC-003-TGO-004 Laiken, E. (2022) 'togo-targeting-replication', GitHub repository. Available at: https://github.com/emilylaiken/togo-targeting-replication (Accessed: 27 March 2026).
https://github.com/emilylaiken/togo-targeting-replication
SRC-002-TGO-004 World Bank (2021). Prioritizing the poorest and most vulnerable in West Africa: Togo's Novissi platform for social protection uses machine learning, geospatial analytics, and mobile phone metadata for the pandemic response. Washington, DC: World Bank. Available at: https://www.worldbank.org/en/results/2021/04/13/prioritizing-the-poorest-and-most-vulnerable-in-west-africa-togo-s-novissi-platform-for-social-protection-uses-machine-l (Accessed 24 Mar 2026).
https://www.worldbank.org/en/results/2021/04/13/prioritizing-the-poorest-and-most-vulnerable-in-west-africa-togo-s-novissi-platform-for-social-protection-uses-machine-l

How to Cite

DCI AI Hub (2026). 'Novissi -- Machine Learning Model Performance Monitoring and Bias Detection', AI Hub AI Tracker, case TGO-004. Digital Convergence Initiative. Available at: https://socialprotectionai.org/use-case/TGO-004

Back to case page