Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Values Approximation
This work addresses the need for reliable and efficient explanations of deep neural networks, which is crucial for users in fields like AI safety and interpretability, though it is incremental as it builds on existing Shapley value theory.
The authors tackled the problem of explaining deep neural networks by proposing a polynomial-time algorithm to approximate Shapley values, which are theoretically sound but computationally expensive, and showed that their method produces significantly better approximations than existing state-of-the-art attribution methods.
The problem of explaining the behavior of deep neural networks has recently gained a lot of attention. While several attribution methods have been proposed, most come without strong theoretical foundations, which raises questions about their reliability. On the other hand, the literature on cooperative game theory suggests Shapley values as a unique way of assigning relevance scores such that certain desirable properties are satisfied. Unfortunately, the exact evaluation of Shapley values is prohibitively expensive, exponential in the number of input features. In this work, by leveraging recent results on uncertainty propagation, we propose a novel, polynomial-time approximation of Shapley values in deep neural networks. We show that our method produces significantly better approximations of Shapley values than existing state-of-the-art attribution methods.