Compensated Integrated Gradients to Reliably Interpret EEG Classification
This work addresses the challenge of reliable interpretation for EEG classification models, offering an incremental improvement over existing attribution methods.
The paper tackles the problem of unreliable feature attribution in classification models due to baseline selection in integrated gradients by proposing a compensated integrated gradients method that eliminates the need for a baseline. The result shows that the proposed method provides more reliable attributions than original integrated gradients and has much lower computational complexity than Shapley sampling, as demonstrated on three EEG datasets.
Integrated gradients are widely employed to evaluate the contribution of input features in classification models because it satisfies the axioms for attribution of prediction. This method, however, requires an appropriate baseline for reliable determination of the contributions. We propose a compensated integrated gradients method that does not require a baseline. In fact, the method compensates the attributions calculated by integrated gradients at an arbitrary baseline using Shapley sampling. We prove that the method retrieves reliable attributions if the processes of input features in a classifier are mutually independent, and they are identical like shared weights in convolutional neural networks. Using three electroencephalogram datasets, we experimentally demonstrate that the attributions of the proposed method are more reliable than those of the original integrated gradients, and its computational complexity is much lower than that of Shapley sampling.