Order in the Court: Explainable AI Methods Prone to Disagreement
This work addresses the problem of inconsistent explanations in AI for practitioners, highlighting an incremental issue in method evaluation.
The study investigated the agreement among various explainable AI methods, including LIME, Integrated Gradients, and attention-based explanations, on language tasks, finding that in most cases, none of the methods agreed, and concluding that rank correlation is not a suitable measure for evaluating feature-additive methods.
By computing the rank correlation between attention weights and feature-additive explanation methods, previous analyses either invalidate or support the role of attention-based explanations as a faithful and plausible measure of salience. To investigate whether this approach is appropriate, we compare LIME, Integrated Gradients, DeepLIFT, Grad-SHAP, Deep-SHAP, and attention-based explanations, applied to two neural architectures trained on single- and pair-sequence language tasks. In most cases, we find that none of our chosen methods agree. Based on our empirical observations and theoretical objections, we conclude that rank correlation does not measure the quality of feature-additive methods. Practitioners should instead use the numerous and rigorous diagnostic methods proposed by the community.