ML LGMay 25, 2021

SHAFF: Fast and consistent SHApley eFfect estimates via random Forests

Clément Bénard, Gérard Biau, Sébastien da Veiga, Erwan Scornet

arXiv:2105.11724v317.440 citations

Originality Highly original

AI Analysis

This work addresses the need for efficient and reliable interpretability tools in critical decision-making applications, offering a novel method that improves upon existing Shapley effect estimators.

The paper tackles the computational and accuracy challenges in estimating Shapley effects for variable importance in machine learning, introducing SHAFF, which provides fast and consistent estimates even with dependent variables, achieving significant speed improvements over competitors.

Interpretability of learning algorithms is crucial for applications involving critical decisions, and variable importance is one of the main interpretation tools. Shapley effects are now widely used to interpret both tree ensembles and neural networks, as they can efficiently handle dependence and interactions in the data, as opposed to most other variable importance measures. However, estimating Shapley effects is a challenging task, because of the computational complexity and the conditional expectation estimates. Accordingly, existing Shapley algorithms have flaws: a costly running time, or a bias when input variables are dependent. Therefore, we introduce SHAFF, SHApley eFfects via random Forests, a fast and accurate Shapley effect estimate, even when input variables are dependent. We show SHAFF efficiency through both a theoretical analysis of its consistency, and the practical performance improvements over competitors with extensive experiments. An implementation of SHAFF in C++ and R is available online.

View on arXiv PDF

Similar