AI LG THAug 22, 2019

The many Shapley values for model explanation

arXiv:1908.08474v245.9842 citations

Originality Incremental advance

AI Analysis

This addresses a foundational issue in interpretable AI for researchers and practitioners, but it is incremental as it builds on existing axiomatic approaches.

The paper tackles the problem of inconsistent Shapley value operationalizations in model explanation, which produce varying and sometimes counterintuitive attributions, and proposes Baseline Shapley (BShap) as a solution with a proper uniqueness result.

The Shapley value has become a popular method to attribute the prediction of a machine-learning model on an input to its base features. The use of the Shapley value is justified by citing [16] showing that it is the \emph{unique} method that satisfies certain good properties (\emph{axioms}). There are, however, a multiplicity of ways in which the Shapley value is operationalized in the attribution problem. These differ in how they reference the model, the training data, and the explanation context. These give very different results, rendering the uniqueness result meaningless. Furthermore, we find that previously proposed approaches can produce counterintuitive attributions in theory and in practice---for instance, they can assign non-zero attributions to features that are not even referenced by the model. In this paper, we use the axiomatic approach to study the differences between some of the many operationalizations of the Shapley value for attribution, and propose a technique called Baseline Shapley (BShap) that is backed by a proper uniqueness result. We also contrast BShap with Integrated Gradients, another extension of Shapley value to the continuous setting.

View on arXiv PDF

Similar