EM MLMar 16, 2018

Evaluating Conditional Cash Transfer Policies with Machine Learning Methods

arXiv:1803.06401v11 citations

Originality Synthesis-oriented

AI Analysis

It addresses the adoption of machine learning in economic policy analysis, showing its potential in big data contexts but highlighting limitations with small data, which is incremental as it builds on existing methods in economics.

This paper compared machine learning models (CART, C4.5, LASSO, random forest, adaboost) to a structural econometric model for predicting outcomes in a cash transfer program in Mexico, finding that machine learning had lower errors (e.g., mean absolute error and root mean square error) in out-of-sample forecasts but the structural model performed better in long-term within-sample simulations with limited data.

This paper presents an out-of-sample prediction comparison between major machine learning models and the structural econometric model. Over the past decade, machine learning has established itself as a powerful tool in many prediction applications, but this approach is still not widely adopted in empirical economic studies. To evaluate the benefits of this approach, I use the most common machine learning algorithms, CART, C4.5, LASSO, random forest, and adaboost, to construct prediction models for a cash transfer experiment conducted by the Progresa program in Mexico, and I compare the prediction results with those of a previous structural econometric study. Two prediction tasks are performed in this paper: the out-of-sample forecast and the long-term within-sample simulation. For the out-of-sample forecast, both the mean absolute error and the root mean square error of the school attendance rates found by all machine learning models are smaller than those found by the structural model. Random forest and adaboost have the highest accuracy for the individual outcomes of all subgroups. For the long-term within-sample simulation, the structural model has better performance than do all of the machine learning models. The poor within-sample fitness of the machine learning model results from the inaccuracy of the income and pregnancy prediction models. The result shows that the machine learning model performs better than does the structural model when there are many data to learn; however, when the data are limited, the structural model offers a more sensible prediction. The findings of this paper show promise for adopting machine learning in economic policy analyses in the era of big data.

View on arXiv PDF

Similar