MLLGMay 25, 2021

SHAFF: Fast and consistent SHApley eFfect estimates via random Forests

arXiv:2105.11724v340 citations
Originality Highly original
AI Analysis

This work addresses the need for efficient and reliable interpretability tools in critical decision-making applications, offering a novel method that improves upon existing Shapley effect estimators.

The paper tackles the computational and accuracy challenges in estimating Shapley effects for variable importance in machine learning, introducing SHAFF, which provides fast and consistent estimates even with dependent variables, achieving significant speed improvements over competitors.

Interpretability of learning algorithms is crucial for applications involving critical decisions, and variable importance is one of the main interpretation tools. Shapley effects are now widely used to interpret both tree ensembles and neural networks, as they can efficiently handle dependence and interactions in the data, as opposed to most other variable importance measures. However, estimating Shapley effects is a challenging task, because of the computational complexity and the conditional expectation estimates. Accordingly, existing Shapley algorithms have flaws: a costly running time, or a bias when input variables are dependent. Therefore, we introduce SHAFF, SHApley eFfects via random Forests, a fast and accurate Shapley effect estimate, even when input variables are dependent. We show SHAFF efficiency through both a theoretical analysis of its consistency, and the practical performance improvements over competitors with extensive experiments. An implementation of SHAFF in C++ and R is available online.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes