WeightedSHAP: analyzing and improving Shapley based feature attributions
This work addresses limitations in model interpretability for machine learning practitioners, offering an incremental improvement over existing Shapley-based methods.
The paper tackles the problem of Shapley feature attribution assigning unintuitive importance due to uniform weighting of marginal contributions, and proposes WeightedSHAP, which learns weights from data to better identify influential features, demonstrating improved recapitulation of model predictions on real-world datasets.
Shapley value is a popular approach for measuring the influence of individual features. While Shapley feature attribution is built upon desiderata from game theory, some of its constraints may be less natural in certain machine learning settings, leading to unintuitive model interpretation. In particular, the Shapley value uses the same weight for all marginal contributions -- i.e. it gives the same importance when a large number of other features are given versus when a small number of other features are given. This property can be problematic if larger feature sets are more or less informative than smaller feature sets. Our work performs a rigorous analysis of the potential limitations of Shapley feature attribution. We identify simple settings where the Shapley value is mathematically suboptimal by assigning larger attributions for less influential features. Motivated by this observation, we propose WeightedSHAP, which generalizes the Shapley value and learns which marginal contributions to focus directly from data. On several real-world datasets, we demonstrate that the influential features identified by WeightedSHAP are better able to recapitulate the model's predictions compared to the features identified by the Shapley value.