LG ME MLJun 10, 2025

Local MDI+: Local Feature Importances for Tree-Based Models

Zhongyuan Liang, Zachary T. Rewolinski, Abhineet Agarwal, Tiffany M. Tang, Bin Yu

arXiv:2506.08928v17.11 citationsh-index: 4

Originality Incremental advance

AI Analysis

This work addresses interpretability needs in high-stakes domains using tree-based models, offering a more stable and accurate local feature importance method, though it is incremental as it builds on an existing global framework.

The paper tackles the problem of explaining individual predictions from tree-based models by proposing Local MDI+, which extends a global feature importance method to the sample-specific setting. It outperforms existing methods like LIME and TreeSHAP by averaging a 10% improvement in downstream task performance across twelve real-world datasets.

Tree-based ensembles such as random forests remain the go-to for tabular data over deep learning models due to their prediction performance and computational efficiency. These advantages have led to their widespread deployment in high-stakes domains, where interpretability is essential for ensuring trustworthy predictions. This has motivated the development of popular local (i.e. sample-specific) feature importance (LFI) methods such as LIME and TreeSHAP. However, these approaches rely on approximations that ignore the model's internal structure and instead depend on potentially unstable perturbations. These issues are addressed in the global setting by MDI+, a feature importance method which exploits an equivalence between decision trees and linear models on a transformed node basis. However, the global MDI+ scores are not able to explain predictions when faced with heterogeneous individual characteristics. To address this gap, we propose Local MDI+ (LMDI+), a novel extension of the MDI+ framework to the sample specific setting. LMDI+ outperforms existing baselines LIME and TreeSHAP in identifying instance-specific signal features, averaging a 10% improvement in downstream task performance across twelve real-world benchmark datasets. It further demonstrates greater stability by consistently producing similar instance-level feature importance rankings across multiple random forest fits. Finally, LMDI+ enables local interpretability use cases, including the identification of closer counterfactuals and the discovery of homogeneous subgroups.

View on arXiv PDF

Similar