MLLGDec 7, 2020

Shapley values for cluster importance: How clusters of the training data affect a prediction

arXiv:2012.03625v212 citations
AI Analysis

This method offers a novel way to understand the impact of training data clusters on black-box model predictions, providing complementary insights for model developers and users.

This paper introduces a new method to explain predictions from data-driven models by quantifying how different clusters within the training data influence a specific prediction. It extends Shapley values, traditionally used for feature importance, to measure the importance of training data clusters.

This paper proposes a novel approach to explain the predictions made by data-driven methods. Since such predictions rely heavily on the data used for training, explanations that convey information about how the training data affects the predictions are useful. The paper proposes a novel approach to quantify how different data-clusters of the training data affect a prediction. The quantification is based on Shapley values, a concept which originates from coalitional game theory, developed to fairly distribute the payout among a set of cooperating players. A player's Shapley value is a measure of that player's contribution. Shapley values are often used to quantify feature importance, ie. how features affect a prediction. This paper extends this to cluster importance, letting clusters of the training data act as players in a game where the predictions are the payouts. The novel methodology proposed in this paper lets us explore and investigate how different clusters of the training data affect the predictions made by any black-box model, allowing new aspects of the reasoning and inner workings of a prediction model to be conveyed to the users. The methodology is fundamentally different from existing explanation methods, providing insight which would not be available otherwise, and should complement existing explanation methods, including explanations based on feature importance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes