MLLGNov 8, 2022

Individualized and Global Feature Attributions for Gradient Boosted Trees in the Presence of $\ell_2$ Regularization

arXiv:2211.04409v12 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses a specific limitation in interpretability for gradient boosted trees, offering improved feature attribution methods that account for regularization, which is incremental but important for practitioners in fields like genomics.

The paper tackles the problem that existing feature attribution methods for gradient boosted trees ignore the effects of ℓ₂ regularization during training, and it introduces PreDecomp, a novel individualized attribution method that theoretically aligns with total gain and recovers additive models under independence, while TreeInner, a debiased global attribution derived from this, achieves state-of-the-art feature selection performance in experiments on simulated and genomic datasets.

While $\ell_2$ regularization is widely used in training gradient boosted trees, popular individualized feature attribution methods for trees such as Saabas and TreeSHAP overlook the training procedure. We propose Prediction Decomposition Attribution (PreDecomp), a novel individualized feature attribution for gradient boosted trees when they are trained with $\ell_2$ regularization. Theoretical analysis shows that the inner product between PreDecomp and labels on in-sample data is essentially the total gain of a tree, and that it can faithfully recover additive models in the population case when features are independent. Inspired by the connection between PreDecomp and total gain, we also propose TreeInner, a family of debiased global feature attributions defined in terms of the inner product between any individualized feature attribution and labels on out-sample data for each tree. Numerical experiments on a simulated dataset and a genomic ChIP dataset show that TreeInner has state-of-the-art feature selection performance. Code reproducing experiments is available at https://github.com/nalzok/TreeInner .

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes