LGAICRSep 9, 2023

Good-looking but Lacking Faithfulness: Understanding Local Explanation Methods through Trend-based Testing

arXiv:2309.05679v15 citationsh-index: 23Has Code
Originality Incremental advance
AI Analysis

This work addresses the need for reliable evaluation of explanation methods in AI interpretability, which is crucial for model debugging and security, though it is incremental as it builds on existing faithfulness testing.

The paper tackles the problem of evaluating the faithfulness of local explanation methods for deep learning models, proposing three trend-based tests that outperform traditional tests on image, natural language, and security tasks, enabling assessment on complex data for the first time.

While enjoying the great achievements brought by deep learning (DL), people are also worried about the decision made by DL models, since the high degree of non-linearity of DL models makes the decision extremely difficult to understand. Consequently, attacks such as adversarial attacks are easy to carry out, but difficult to detect and explain, which has led to a boom in the research on local explanation methods for explaining model decisions. In this paper, we evaluate the faithfulness of explanation methods and find that traditional tests on faithfulness encounter the random dominance problem, \ie, the random selection performs the best, especially for complex data. To further solve this problem, we propose three trend-based faithfulness tests and empirically demonstrate that the new trend tests can better assess faithfulness than traditional tests on image, natural language and security tasks. We implement the assessment system and evaluate ten popular explanation methods. Benefiting from the trend tests, we successfully assess the explanation methods on complex data for the first time, bringing unprecedented discoveries and inspiring future research. Downstream tasks also greatly benefit from the tests. For example, model debugging equipped with faithful explanation methods performs much better for detecting and correcting accuracy and security problems.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes