AI CV LGJul 18, 2023

Saliency strikes back: How filtering out high frequencies improves white-box explanations

Sabine Muzellec, Thomas Fel, Victor Boutin, Léo andéol, Rufin VanRullen, Thomas Serre

Harvard

arXiv:2307.09591v47.95 citationsh-index: 22

Originality Incremental advance

AI Analysis

This work addresses a limitation in explainable AI for users needing efficient and faithful model explanations, though it is incremental as it improves existing methods rather than introducing a new paradigm.

The paper tackled the problem of high-frequency artifacts contaminating gradient signals in white-box attribution methods for explainable AI, introducing FORGrad to filter these artifacts and showing it consistently enhances performance, enabling white-box methods to compete with more accurate but computationally demanding black-box methods.

Attribution methods correspond to a class of explainability methods (XAI) that aim to assess how individual inputs contribute to a model's decision-making process. We have identified a significant limitation in one type of attribution methods, known as ``white-box" methods. Although highly efficient, as we will show, these methods rely on a gradient signal that is often contaminated by high-frequency artifacts. To overcome this limitation, we introduce a new approach called "FORGrad". This simple method effectively filters out these high-frequency artifacts using optimal cut-off frequencies tailored to the unique characteristics of each model architecture. Our findings show that FORGrad consistently enhances the performance of already existing white-box methods, enabling them to compete effectively with more accurate yet computationally demanding "black-box" methods. We anticipate that our research will foster broader adoption of simpler and more efficient white-box methods for explainability, offering a better balance between faithfulness and computational efficiency.

View on arXiv PDF

Similar