Policy makers require up-to-date poverty data to make informed decisions on budgets, program targeting, and evaluation of economic outcomes. However, in many countries, the latest household surveys containing consumption data are outdated, while newer surveys often lack this critical component. As a result, poverty measurement becomes indirect, and key policy choices are made without reliable current information on living standards, including whether labor markets are improving incomes and job quality.
A widely used solution to this gap is survey-to-survey (S2S) imputation, where statistical models are trained on older surveys with full consumption data and then applied to newer surveys without it. This approach enables faster and more cost-effective poverty monitoring, but it depends on a critical assumption: that relationships between household characteristics and welfare remain stable over time. When these relationships shift, especially during major economic disruptions, the method can become unreliable.
To study this issue in real-world conditions, the World Bank and DrivenData organized a Survey-to-Survey Imputation Challenge. Participants built models using historical survey data and then predicted poverty levels in newer datasets where consumption was hidden. The evaluation was designed to mimic real policy uncertainty by splitting data into validation and test sets, with feedback available only for the validation portion. Importantly, validation data came from a pre-COVID period, while test data reflected post-COVID economic conditions, introducing structural change between the two.
After the competition, results were compared against actual poverty rates across different segments of the consumption distribution. This allowed assessment not only of predictive accuracy but also of how well participants could select the best model under uncertainty. A key finding was that model selection itself is highly challenging. Many participants did not choose their best-performing model, and even top potential performers lost significant ranking due to suboptimal selection. This highlights that in real policy settings, even strong models can lead to misleading conclusions if the wrong specification is chosen for publication.
Another important insight was that strong validation performance does not guarantee strong real-world performance. While validation results generally provided useful signals, small differences were often unreliable when applied to the test data, particularly because the underlying economic relationships had changed after the pandemic. This showed that relying too heavily on validation rankings can lead to incorrect model selection when structural shifts are present.
The challenge also revealed that predicting poverty during periods of economic shock is particularly difficult. Many models incorrectly suggested falling poverty when in reality poverty had increased. This happened because standard survey variables such as employment status failed to capture deeper income shocks like wage declines. As a result, models trained on pre-crisis relationships often produced systematically biased results under post-crisis conditions.
These findings have important implications for real-time poverty monitoring. Errors in poverty estimation can significantly affect policy decisions, including which populations are prioritized and how social protection systems are designed and expanded. The study shows that S2S imputation works reasonably well when economic relationships are stable, but becomes fragile during structural changes, where both prediction error and model selection risk increase simultaneously.
Overall, the results emphasize that improving technical models alone is not sufficient. Greater attention is needed on how models are selected, how uncertainty is evaluated, and how results are communicated. In practice, combining statistical performance with economic reasoning, contextual knowledge, and external macroeconomic indicators is essential for producing reliable and responsible poverty estimates in real time.







