LGNov 16, 2022

Can Strategic Data Collection Improve the Performance of Poverty Prediction Models?

Berkeley
arXiv:2211.08735v13 citationsh-index: 33
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of optimizing data collection for poverty prediction, which is crucial for humanitarian aid targeting, but the results are incremental as they show no improvement over existing methods.

The study tested whether adaptive sampling strategies for ground truth data collection could improve poverty prediction models, but found that none of the active learning methods outperformed uniform-at-random sampling.

Machine learning-based estimates of poverty and wealth are increasingly being used to guide the targeting of humanitarian aid and the allocation of social assistance. However, the ground truth labels used to train these models are typically borrowed from existing surveys that were designed to produce national statistics -- not to train machine learning models. Here, we test whether adaptive sampling strategies for ground truth data collection can improve the performance of poverty prediction models. Through simulations, we compare the status quo sampling strategies (uniform at random and stratified random sampling) to alternatives that prioritize acquiring training data based on model uncertainty or model performance on sub-populations. Perhaps surprisingly, we find that none of these active learning methods improve over uniform-at-random sampling. We discuss how these results can help shape future efforts to refine machine learning-based estimates of poverty.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes