This case concerns a body of applied research led by World Bank Poverty and Equity Global Practice researchers and academic partners, which develops and evaluates machine-learning-based geospatial poverty mapping methods to support village-level geographical targeting of social assistance programmes in Malawi. The research programme spans multiple studies published between 2022 and 2025, addressing a core operational challenge in low-income countries: how to identify the poorest villages for geographic targeting when comprehensive household-level data are unavailable or outdated.
The primary methodological contribution, documented in the World Bank Economic Review (Gualavisi and Newhouse, 2025, Vol. 39, No. 2, pp. 377-409), introduces a cost-effective strategy that combines household consumption survey data, publicly available geospatial indicators, and a simulated partial registry to produce village-level poverty estimates. The partial registry simulates data from 450 villages across 10 impoverished districts in Malawi, containing proxy poverty indicators collected at the household level. These proxy indicators are used to impute estimates of household per capita consumption, which in turn train a prediction model using publicly available geospatial data. The machine-learning approach employs XGBoost (extreme gradient boosting), a tree-based ensemble algorithm, to integrate the survey and geospatial features for prediction. The geospatial predictors include publicly available satellite-derived indicators such as night-time light intensity, normalised difference vegetation index (NDVI) from satellite imagery, land cover classifications, road network density, population density estimates, and building footprint data. The household survey data come from the Malawi Integrated Household Survey (IHS), which is a nationally representative consumption and expenditure survey conducted by the National Statistical Office of Malawi with World Bank support.
The key quantitative finding is that the partial registry model achieves a rank correlation of 0.75 with actual village-level welfare measures, substantially outperforming three alternative approaches: proxy means test (PMT) scores, the Meta Relative Wealth Index, and predictions from household survey data combined with geospatial indicators alone, which produced rank correlations ranging from negative 0.02 to 0.2. The results hold under various robustness checks, including the addition of Gaussian noise to the proxy poverty indicators, demonstrating that even imperfect household-level data from a partial registry significantly improves the accuracy of geospatial poverty predictions for village-level geographic targeting.
A complementary study by van der Weide, Blankespoor, Elbers, and Lanjouw, published in the Journal of Development Economics (2024, Vol. 167), directly evaluates the accuracy of poverty maps based on remote sensing data alone in Malawi. The study first obtains small area estimates (SAE) of poverty by combining household expenditure survey data with population census data as a benchmark, then produces a second poverty map using only survey data combined with predictors derived from remote sensing. The two approaches reveal similar broad geographic poverty patterns, but the remote-sensing-based maps are less reliable for estimates of specific small areas. The study concludes that remote-sensing-based poverty maps may perform adequately for comparing poverty between assemblies of areas but should be used with caution when the focus is on estimates for individual small areas.
A further methodological contribution is provided by the United Nations ESCAP Statistics Division (2024), which developed a geospatial small area estimation how-to guide using Northern Malawi as a worked example. This guide demonstrates how geospatial indicators can be integrated with survey data using small area estimation techniques to produce poverty estimates at fine geographic scales, with evidence suggesting that combining geospatial data with surveys increases the precision of poverty estimates by an amount equivalent to expanding the survey sample by a factor of 3 to 7, depending on the context and indicator.
The technical approach across these studies uses traditional machine learning rather than deep learning or foundation models. The primary algorithms include XGBoost (gradient-boosted decision trees), along with other ensemble and regression methods. The models are trained on combinations of household survey variables and geospatial features, with the prediction target being village or primary sampling unit (PSU) level consumption or poverty estimates. No automated eligibility decisions are made; the outputs are informational poverty maps and rankings used by human analysts and policymakers to inform geographic targeting decisions for social assistance programmes.
The research is situated within the context of Malawi's social assistance system, where geographic targeting is used to prioritise districts and communities for programme coverage. Malawi is a low-income country in Sub-Saharan Africa where approximately half the population lives below the national poverty line. The social protection system includes several poverty-targeted cash transfer programmes, most notably the Social Cash Transfer Programme (SCTP) and the Malawi Social Action Fund (MASAF), both of which use geographic and community-based targeting mechanisms to identify beneficiaries. The poverty maps produced by these research methods could directly inform the first stage of this targeting process, helping to identify which villages and areas should receive priority coverage.
The implementing agencies include the World Bank Poverty and Equity Global Practice research team, with key researchers including Melany Gualavisi and David Newhouse for the WBER study, and Roy van der Weide, Brian Blankespoor, Chris Elbers, and Peter Lanjouw for the JDE study. The UNESCAP guide was produced by the Statistics Division in collaboration with academic partners. No operational deployment of these models within Malawi's social protection delivery system has been documented; all work remains at the research and development stage.
Human oversight is inherent in the research context: analysts develop, validate, and interpret the poverty maps, and no automated eligibility or benefit decisions are made. The decision criticality is low because the outputs serve as informational inputs to geographic targeting decisions rather than directly determining individual eligibility. The primary risks relate to data quality and representation, as the accuracy of poverty predictions depends on the quality and coverage of both the household survey data and the geospatial indicators, and there are documented concerns about the reliability of remote-sensing-based estimates for specific small areas.