TGO-004

Novissi -- Machine Learning Model Performance Monitoring and Bias Detection

Download PDF
Togo Sub-Saharan Africa Low income Pilot / Controlled Trial Phase Confirmed

Government of Togo -- Ministry of Digital Economy and Digital Transformation (MENTD)

At a Glance

What it does Classification — Policy analysis, learning and M&E
Who runs it Government of Togo -- Ministry of Digital Economy and Digital Transformation (MENTD)
Programme Novissi Emergency Cash Transfer Programme -- ML Performance Monitoring Component
Confidence Confirmed
Deployment Status Pilot / Controlled Trial Phase
Key Risks Model-related risks
Key Outcomes Phone-based ML targeting reduced exclusion errors by 4-21% relative to geographic targeting alternatives; traditional PMT would outperform ML if registry data were available (9-35% higher exclusion errors for ML vs PMT); findings published in Nature (2022) contributing to global evidence on AI targeting performance in social protection.
Source Quality 3 sources — Academic journal article, Dataset / database, Report (multilateral / development partner)

The Government of Togo, in collaboration with academic research partners from the University of California Berkeley Center for Effective Global Action (CEGA), the Data-Intensive Development Lab, J-PAL, and Northwestern University, conducted a systematic evaluation of the machine learning model used to target emergency cash transfers under the Novissi social protection programme. This evaluation component -- distinct from the targeting model itself (documented separately as TGO-001) -- focused on monitoring the performance, fairness, and error behaviour of the phone-based poverty prediction algorithms that were used to prioritise the poorest individuals for COVID-19 relief payments between November 2020 and March 2021.

The Novissi programme was launched in April 2020 by Togo's Ministry of Digital Economy and Digital Transformation (MENTD) as an emergency cash transfer programme for informal sector workers affected by the COVID-19 pandemic. The first phase distributed payments to 572,852 informal workers in the greater Lome area (World Bank, 2021). In a second phase, 57,000 new beneficiaries in the 100 poorest rural cantons were identified using machine learning algorithms trained on anonymised mobile phone metadata -- specifically call detail records (CDR) -- to predict individual consumption levels for 5.7 million subscribers, representing approximately 70 percent of Togo's population (World Bank, 2021; Aiken et al., 2022).

The performance monitoring and evaluation effort assessed how accurately the machine learning targeting model identified the poorest individuals compared to alternative targeting approaches. The evaluation used ground-truth consumption and wealth data collected through two independent datasets: a nationally representative field survey conducted in 2018-2019 (n = 6,171 households) and a large phone survey of active mobile subscribers conducted in 2020 (n = 8,915 individuals). These ground-truth datasets provided the benchmark against which model predictions were compared (Aiken et al., 2022).

The evaluation framework applied multiple performance metrics including classification accuracy (proportion of correctly identified poor and non-poor), precision (proportion of targeted individuals who are truly poor), exclusion error (proportion of truly poor individuals excluded from benefits), and inclusion error (proportion of non-poor individuals incorrectly receiving benefits). These metrics were calculated across different targeting thresholds and compared against several counterfactual targeting methods: geographic targeting at the prefecture and canton level, occupation-based targeting, and proxy means testing (PMT) -- the latter representing a hypothetical benchmark since no comprehensive social registry existed in Togo at the time (Aiken et al., 2022).

The evaluation found that the phone-based machine learning targeting approach reduced exclusion errors by 4 to 21 percent relative to the geographic targeting alternatives that were considered by the Government of Togo. However, relative to methods that would require a comprehensive social registry -- specifically proxy means testing based on detailed household survey data -- the machine learning approach increased exclusion errors by 9 to 35 percent. This finding was significant for informing future programme design, as it demonstrated that while phone-based ML targeting outperformed the feasible alternatives available during the crisis, it would not outperform traditional PMT if registry data were available (Aiken et al., 2022).

A critical component of the evaluation addressed algorithmic fairness and potential bias. The research team planned in-person surveys for 2021 specifically designed to detect inadvertent predictive bias against vulnerable subgroups, including women, illiterate individuals, and other marginalised populations. The fairness analysis disaggregated performance metrics by demographic characteristics including gender, literacy status, and other marginalisation indicators to identify whether the model systematically underperformed for specific groups (World Bank, 2021; Aiken et al., 2022).

The machine learning models evaluated were classical ML algorithms -- specifically gradient boosted decision trees and regularised logistic regression models -- trained on 857 to 1,042 CDR features describing aspects of each subscriber's mobile phone behaviour. These features included call and SMS activity patterns, mobility indicators derived from cell tower locations, social network characteristics, mobile data consumption, and expenditure on airtime and mobile money (Aiken et al., 2022). The evaluation results were published in the peer-reviewed journal Nature in 2022, enabling independent external scrutiny of the methodology and findings. The research was conducted under academic institutional review board (IRB) protocols, though no specific regulatory or legal framework governing the monitoring component was documented in the primary sources.

Classifications follow the DCI AI Hub Taxonomy. Hover over field labels for definitions.

Social Protection Functions

Implementation/delivery chain
Monitoring and evaluation primary
Programme design
Eligibility criteria and qualifying conditions
SP Pillar (Primary) The social protection branch: social assistance, social insurance, or labour market programmes. Social assistance
Programme Name Novissi Emergency Cash Transfer Programme -- ML Performance Monitoring Component
Programme Type The type of social protection programme, classified under social assistance, social insurance, or labour market programmes. View in glossary Emergency Cash Transfers
System Level Where in the social protection system the AI is applied: policy level, programme design, or implementation/delivery chain. View in glossary Implementation/delivery chain
Programme Description Systematic evaluation of the machine learning poverty prediction model used to target beneficiaries under the Novissi emergency cash transfer programme, assessing model accuracy, exclusion/inclusion errors, and algorithmic fairness across demographic subgroups.
Implementation Type How the AI output is produced: Classical ML, Deep learning, Foundation model, or Hybrid. Affects validation, compute requirements, and governance profile. View in glossary Classical ML
Lifecycle Stage Current stage in the AI lifecycle, from problem identification through to monitoring, maintenance and decommissioning. View in glossary Monitoring, Maintenance and Decommissioning
Model Provenance Origin of the AI model: developed in-house, adapted from open-source, commercial/proprietary, or accessed via third-party API. View in glossary Adapted from open-source
Compute Environment Where the AI system runs: on-premise, government cloud, commercial cloud, or edge/device. View in glossary Not documented
Sovereignty Quadrant Classification of data and compute sovereignty: I (Sovereign), II (Federated/Hybrid), III (Cloud with safeguards), or IV (Shared Innovation Zone). View in glossary IV — Shared Innovation Zone
Data Residency Where the data used by the AI system is stored: domestic, regional, or international. View in glossary International
Data Residency Detail Additional detail on the specific data hosting arrangements and jurisdictions. Survey data collected in Togo; analysis conducted by US-based research institutions (UC Berkeley, Northwestern University)
Cross-Border Transfer Whether data crosses national borders, and if so, whether documented safeguards are in place. View in glossary Without documented safeguards
Decision Criticality The rights impact of the decision the AI supports. High criticality requires HITL oversight; moderate requires HOTL; low may operate HOOTL. View in glossary Moderate
Human Oversight Type Level of human involvement: Human-in-the-Loop (active review), Human-on-the-Loop (monitoring), or Human-out-of-the-Loop (periodic audit). View in glossary HOTL
Development Process Whether the AI system was developed fully in-house, through a mix of in-house and third-party, or fully by an external provider. View in glossary Mix of in-house and third-party
Highest Risk Category The most significant structural risk source identified: data, model, operational, governance, or market/sovereignty risks. View in glossary Model-related risks
Risk Assessment Status Whether a formal risk assessment, informal assessment, or independent audit has been conducted for this system. Independent audit completed

Risk Dimensions

Market, sovereignty and industry structure risks
Model-related risks

Impact Dimensions

  • Bias audit
  • Independent evaluation
CategorySensitivityCross-System LinkageAvailabilityKey Constraints
Beneficiary registries and MISPersonalLinks data across multiple systemsCurrently available and usedNovissi beneficiary registry and mobile money payment records; used as reference for evaluating targeting outcomes.
Survey and census dataPersonalLinks data across multiple systemsCurrently available and usedGround-truth consumption data from 2018-2019 nationally representative field survey (n=6,171) and 2020 phone survey (n=8,915); phone survey respondents skewed younger, poorer, and more male than general population.
Telecommunications and mobile dataSensitiveLinks data across multiple systemsCurrently available and usedAnonymised call detail records (CDR) from mobile network operators; 857-1,042 features per subscriber including call/SMS patterns, mobility indicators, social network characteristics, mobile data consumption.

Aiken, E., Bellue, S., Karlan, D., Udry, C. and Blumenstock, J. (2022). Machine learning and phone data can improve targeting of humanitarian aid. Nature, 603, pp.864-870. DOI: 10.1038/s41586-022-04484-9. Available at: https://www.nature.com/articles/s41586-022-04484-9 (Accessed 24 Mar 2026).

View source Academic journal article

Laiken, E. (2022) 'togo-targeting-replication', GitHub repository. Available at: https://github.com/emilylaiken/togo-targeting-replication (Accessed: 27 March 2026).

View source Dataset / database

World Bank (2021). Prioritizing the poorest and most vulnerable in West Africa: Togo's Novissi platform for social protection uses machine learning, geospatial analytics, and mobile phone metadata for the pandemic response. Washington, DC: World Bank. Available at: https://www.worldbank.org/en/results/2021/04/13/prioritizing-the-poorest-and-most-vulnerable-in-west-africa-togo-s-novissi-platform-for-social-protection-uses-machine-l (Accessed 24 Mar 2026).

View source Report (multilateral / development partner)
Deployment Status How far the system has progressed into real-world operational use, from concept/exploration through to scaled and institutionalised. View in glossary Pilot / Controlled Trial Phase
Year Initiated The year the AI system was first initiated or development began. 2020
Scale / Coverage The scale and geographic or population coverage of the deployment. National evaluation scope -- model predictions covered 5.7 million mobile subscribers (70% of population); ground-truth validation surveys covered approximately 15,000 individuals
Funding Source The source(s) of funding for the AI system development and deployment. World Bank IDA financing under the West Africa Unique Identification for Regional Integration and Inclusion (WURI) Programme
Technical Partners External technology vendors, academic partners, or development partners involved. UC Berkeley Center for Effective Global Action (CEGA); Data-Intensive Development Lab; J-PAL; Northwestern University; Innovations for Poverty Action (IPA)
Outcomes / Results Phone-based ML targeting reduced exclusion errors by 4-21% relative to geographic targeting alternatives; traditional PMT would outperform ML if registry data were available (9-35% higher exclusion errors for ML vs PMT); findings published in Nature (2022) contributing to global evidence on AI targeting performance in social protection.
Challenges No comprehensive social registry existed in Togo, limiting the feasibility of traditional PMT approaches; phone survey respondents skewed younger, poorer, and more male than the general population; fairness evaluation for vulnerable subgroups (women, illiterate, marginalised) required additional in-person surveys planned for 2021.

How to Cite

DCI AI Hub (2026). 'Novissi -- Machine Learning Model Performance Monitoring and Bias Detection', AI Hub AI Tracker, case TGO-004. Digital Convergence Initiative. Available at: https://socialprotectionai.org/use-case/TGO-004 [Accessed: 1 April 2026].

Change History

Created 30 Mar 2026, 08:41
by v2-import (import)