ML LGNov 13, 2023

Explainable Boosting Machines with Sparsity -- Maintaining Explainability in High-Dimensional Settings

Brandon M. Greenwell, Annika Dahlmann, Saurabh Dhoble

arXiv:2311.07452v12 citationsh-index: 2

Originality Synthesis-oriented

AI Analysis

This is an incremental improvement for users needing interpretable models in high-dimensional data, enhancing practicality while maintaining explainability.

The paper tackles the problem of explainable boosting machines (EBMs) losing transparency and efficiency in high-dimensional settings by proposing a LASSO-based method to introduce sparsity, which reduces model complexity and drastically improves scoring time.

Compared to "black-box" models, like random forests and deep neural networks, explainable boosting machines (EBMs) are considered "glass-box" models that can be competitively accurate while also maintaining a higher degree of transparency and explainability. However, EBMs become readily less transparent and harder to interpret in high-dimensional settings with many predictor variables; they also become more difficult to use in production due to increases in scoring time. We propose a simple solution based on the least absolute shrinkage and selection operator (LASSO) that can help introduce sparsity by reweighting the individual model terms and removing the less relevant ones, thereby allowing these models to maintain their transparency and relatively fast scoring times in higher-dimensional settings. In short, post-processing a fitted EBM with many (i.e., possibly hundreds or thousands) of terms using the LASSO can help reduce the model's complexity and drastically improve scoring time. We illustrate the basic idea using two real-world examples with code.

View on arXiv PDF

Similar