Representer Point Selection for Explaining Deep Neural Networks
This addresses the need for scalable and interpretable explanations of neural network decisions for researchers and practitioners, though it is incremental as it builds on influence functions.
The paper tackles the problem of explaining deep neural network predictions by identifying influential training points, called representer points, and shows that predictions can be decomposed into linear combinations of these points with positive and negative values indicating excitatory and inhibitory influences, providing more insight than prior methods.
We propose to explain the predictions of a deep neural network, by pointing to the set of what we call representer points in the training set, for a given test point prediction. Specifically, we show that we can decompose the pre-activation prediction of a neural network into a linear combination of activations of training points, with the weights corresponding to what we call representer values, which thus capture the importance of that training point on the learned parameters of the network. But it provides a deeper understanding of the network than simply training point influence: with positive representer values corresponding to excitatory training points, and negative values corresponding to inhibitory points, which as we show provides considerably more insight. Our method is also much more scalable, allowing for real-time feedback in a manner not feasible with influence functions.