GAM(L)A: An econometric model for interpretable Machine Learning
This addresses the need for interpretable models in fields like economics and regulation, though it is incremental as it builds on existing partial linear and variable selection methods.
The authors tackled the problem of interpretability in machine learning by proposing GAM(L)A, a partial linear model that combines parametric and non-parametric functions for variable selection and overfitting control. The results show that GAM(L)A outperforms parametric models with enhancements and performs comparably to random forest and gradient boosting without significant differences.
Despite their high predictive performance, random forest and gradient boosting are often considered as black boxes or uninterpretable models which has raised concerns from practitioners and regulators. As an alternative, we propose in this paper to use partial linear models that are inherently interpretable. Specifically, this article introduces GAM-lasso (GAMLA) and GAM-autometrics (GAMA), denoted as GAM(L)A in short. GAM(L)A combines parametric and non-parametric functions to accurately capture linearities and non-linearities prevailing between dependent and explanatory variables, and a variable selection procedure to control for overfitting issues. Estimation relies on a two-step procedure building upon the double residual method. We illustrate the predictive performance and interpretability of GAM(L)A on a regression and a classification problem. The results show that GAM(L)A outperforms parametric models augmented by quadratic, cubic and interaction effects. Moreover, the results also suggest that the performance of GAM(L)A is not significantly different from that of random forest and gradient boosting.