Novissi -- Machine Learning Model Performance Monitoring and Bias Detection
Overview
The Government of Togo, in collaboration with academic research partners from the University of California Berkeley Center for Effective Global Action (CEGA), the Data-Intensive Development Lab, J-PAL, and Northwestern University, conducted a systematic evaluation of the machine learning model used to target emergency cash transfers under the Novissi social protection programme. This evaluation component -- distinct from the targeting model itself (documented separately as TGO-001) -- focused on monitoring the performance, fairness, and error behaviour of the phone-based poverty prediction algorithms that were used to prioritise the poorest individuals for COVID-19 relief payments between November 2020 and March 2021.
The Novissi programme was launched in April 2020 by Togo's Ministry of Digital Economy and Digital Transformation (MENTD) as an emergency cash transfer programme for informal sector workers affected by the COVID-19 pandemic. The first phase distributed payments to 572,852 informal workers in the greater Lome area (World Bank, 2021). In a second phase, 57,000 new beneficiaries in the 100 poorest rural cantons were identified using machine learning algorithms trained on anonymised mobile phone metadata -- specifically call detail records (CDR) -- to predict individual consumption levels for 5.7 million subscribers, representing approximately 70 percent of Togo's population (World Bank, 2021; Aiken et al., 2022).
The performance monitoring and evaluation effort assessed how accurately the machine learning targeting model identified the poorest individuals compared to alternative targeting approaches. The evaluation used ground-truth consumption and wealth data collected through two independent datasets: a nationally representative field survey conducted in 2018-2019 (n = 6,171 households) and a large phone survey of active mobile subscribers conducted in 2020 (n = 8,915 individuals). These ground-truth datasets provided the benchmark against which model predictions were compared (Aiken et al., 2022).
The evaluation framework applied multiple performance metrics including classification accuracy (proportion of correctly identified poor and non-poor), precision (proportion of targeted individuals who are truly poor), exclusion error (proportion of truly poor individuals excluded from benefits), and inclusion error (proportion of non-poor individuals incorrectly receiving benefits). These metrics were calculated across different targeting thresholds and compared against several counterfactual targeting methods: geographic targeting at the prefecture and canton level, occupation-based targeting, and proxy means testing (PMT) -- the latter representing a hypothetical benchmark since no comprehensive social registry existed in Togo at the time (Aiken et al., 2022).
The evaluation found that the phone-based machine learning targeting approach reduced exclusion errors by 4 to 21 percent relative to the geographic targeting alternatives that were considered by the Government of Togo. However, relative to methods that would require a comprehensive social registry -- specifically proxy means testing based on detailed household survey data -- the machine learning approach increased exclusion errors by 9 to 35 percent. This finding was significant for informing future programme design, as it demonstrated that while phone-based ML targeting outperformed the feasible alternatives available during the crisis, it would not outperform traditional PMT if registry data were available (Aiken et al., 2022).
A critical component of the evaluation addressed algorithmic fairness and potential bias. The research team planned in-person surveys for 2021 specifically designed to detect inadvertent predictive bias against vulnerable subgroups, including women, illiterate individuals, and other marginalised populations. The fairness analysis disaggregated performance metrics by demographic characteristics including gender, literacy status, and other marginalisation indicators to identify whether the model systematically underperformed for specific groups (World Bank, 2021; Aiken et al., 2022).
The machine learning models evaluated were classical ML algorithms -- specifically gradient boosted decision trees and regularised logistic regression models -- trained on 857 to 1,042 CDR features describing aspects of each subscriber's mobile phone behaviour. These features included call and SMS activity patterns, mobility indicators derived from cell tower locations, social network characteristics, mobile data consumption, and expenditure on airtime and mobile money (Aiken et al., 2022). The evaluation results were published in the peer-reviewed journal Nature in 2022, enabling independent external scrutiny of the methodology and findings. The research was conducted under academic institutional review board (IRB) protocols, though no specific regulatory or legal framework governing the monitoring component was documented in the primary sources.
Classification
AI Capabilities
Use Cases
Social Protection Functions
| SP Pillar (Primary) | Social assistance |
Programme Details
| Programme Name | Novissi Emergency Cash Transfer Programme -- ML Performance Monitoring Component |
| Programme Type | Emergency Cash Transfers |
| System Level | Implementation/delivery chain |
Systematic evaluation of the machine learning poverty prediction model used to target beneficiaries under the Novissi emergency cash transfer programme, assessing model accuracy, exclusion/inclusion errors, and algorithmic fairness across demographic subgroups.
Implementation Details
| Implementation Type | Classical ML |
| Lifecycle Stage | Monitoring, Maintenance and Decommissioning |
| Model Provenance | Adapted from open-source |
| Compute Environment | Not documented |
| Sovereignty Quadrant | IV — Shared Innovation Zone |
| Data Residency | International |
| Cross-Border Transfer | Without documented safeguards |
Risk & Oversight
| Decision Criticality | Moderate |
| Human Oversight | HOTL |
| Development Process | Mix of in-house and third-party |
| Highest Risk Category | Model-related risks |
| Risk Assessment Status | Independent audit completed |
Risk Dimensions
Data-related risks
Market, sovereignty and industry structure risks
Model-related risks
Impact Dimensions
Equality, non-discrimination, fairness and inclusion
Safeguards
Deployment & Outcomes
| Deployment Status | Pilot / Controlled Trial Phase |
| Year Initiated | 2020 |
| Scale / Coverage | National evaluation scope -- model predictions covered 5.7 million mobile subscribers (70% of population); ground-truth validation surveys covered approximately 15,000 individuals |
| Funding Source | World Bank IDA financing under the West Africa Unique Identification for Regional Integration and Inclusion (WURI) Programme |
| Technical Partners | UC Berkeley Center for Effective Global Action (CEGA); Data-Intensive Development Lab; J-PAL; Northwestern University; Innovations for Poverty Action (IPA) |
Outcomes / Results
Phone-based ML targeting reduced exclusion errors by 4-21% relative to geographic targeting alternatives; traditional PMT would outperform ML if registry data were available (9-35% higher exclusion errors for ML vs PMT); findings published in Nature (2022) contributing to global evidence on AI targeting performance in social protection.
Challenges
No comprehensive social registry existed in Togo, limiting the feasibility of traditional PMT approaches; phone survey respondents skewed younger, poorer, and more male than the general population; fairness evaluation for vulnerable subgroups (women, illiterate, marginalised) required additional in-person surveys planned for 2021.
Sources
- SRC-001-TGO-004 Aiken, E., Bellue, S., Karlan, D., Udry, C. and Blumenstock, J. (2022). Machine learning and phone data can improve targeting of humanitarian aid. Nature, 603, pp.864-870. DOI: 10.1038/s41586-022-04484-9. Available at: https://www.nature.com/articles/s41586-022-04484-9 (Accessed 24 Mar 2026).
https://www.nature.com/articles/s41586-022-04484-9 - SRC-003-TGO-004 Laiken, E. (2022) 'togo-targeting-replication', GitHub repository. Available at: https://github.com/emilylaiken/togo-targeting-replication (Accessed: 27 March 2026).
https://github.com/emilylaiken/togo-targeting-replication - SRC-002-TGO-004 World Bank (2021). Prioritizing the poorest and most vulnerable in West Africa: Togo's Novissi platform for social protection uses machine learning, geospatial analytics, and mobile phone metadata for the pandemic response. Washington, DC: World Bank. Available at: https://www.worldbank.org/en/results/2021/04/13/prioritizing-the-poorest-and-most-vulnerable-in-west-africa-togo-s-novissi-platform-for-social-protection-uses-machine-l (Accessed 24 Mar 2026).
https://www.worldbank.org/en/results/2021/04/13/prioritizing-the-poorest-and-most-vulnerable-in-west-africa-togo-s-novissi-platform-for-social-protection-uses-machine-l
How to Cite
DCI AI Hub (2026). 'Novissi -- Machine Learning Model Performance Monitoring and Bias Detection', AI Hub AI Tracker, case TGO-004. Digital Convergence Initiative. Available at: https://socialprotectionai.org/use-case/TGO-004