Causal Gradient Boosting: Boosted Instrumental Variable Regression
This work addresses endogeneity bias in machine learning for researchers and practitioners, offering a data-driven solution that improves upon recent methods, though it is incremental as it builds on gradient boosting and 2SLS.
The paper tackles the problem of endogeneity bias in supervised learning by proposing boostIV, a gradient boosting-based instrumental variable regression algorithm that corrects for this bias. The result shows that boostIV is consistent under mild conditions and significantly outperforms existing methods in finite sample simulations, with average performance gains.
Recent advances in the literature have demonstrated that standard supervised learning algorithms are ill-suited for problems with endogenous explanatory variables. To correct for the endogeneity bias, many variants of nonparameteric instrumental variable regression methods have been developed. In this paper, we propose an alternative algorithm called boostIV that builds on the traditional gradient boosting algorithm and corrects for the endogeneity bias. The algorithm is very intuitive and resembles an iterative version of the standard 2SLS estimator. Moreover, our approach is data driven, meaning that the researcher does not have to make a stance on neither the form of the target function approximation nor the choice of instruments. We demonstrate that our estimator is consistent under mild conditions. We carry out extensive Monte Carlo simulations to demonstrate the finite sample performance of our algorithm compared to other recently developed methods. We show that boostIV is at worst on par with the existing methods and on average significantly outperforms them.