LGAIJul 29, 2024

Revisiting the robustness of post-hoc interpretability methods

arXiv:2407.19683v110 citationsh-index: 21
Originality Incremental advance
AI Analysis

This work addresses the need for more precise evaluation of interpretability methods in AI, but it is incremental as it builds on existing coarse-grained strategies.

The paper tackles the problem of inconsistent results from post-hoc interpretability methods in explainable AI by proposing a new approach and two metrics for fine-grained assessment of robustness, showing that robustness is generally linked to coarse-grained performance.

Post-hoc interpretability methods play a critical role in explainable artificial intelligence (XAI), as they pinpoint portions of data that a trained deep learning model deemed important to make a decision. However, different post-hoc interpretability methods often provide different results, casting doubts on their accuracy. For this reason, several evaluation strategies have been proposed to understand the accuracy of post-hoc interpretability. Many of these evaluation strategies provide a coarse-grained assessment -- i.e., they evaluate how the performance of the model degrades on average by corrupting different data points across multiple samples. While these strategies are effective in selecting the post-hoc interpretability method that is most reliable on average, they fail to provide a sample-level, also referred to as fine-grained, assessment. In other words, they do not measure the robustness of post-hoc interpretability methods. We propose an approach and two new metrics to provide a fine-grained assessment of post-hoc interpretability methods. We show that the robustness is generally linked to its coarse-grained performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes