MLLGNov 2, 2017

The (Un)reliability of saliency methods

arXiv:1711.00867v1786 citations
Originality Incremental advance
AI Analysis

This highlights a critical flaw in widely used explanation techniques for AI models, which could mislead researchers and practitioners in interpreting model decisions.

The paper tackles the problem of unreliable saliency methods in deep neural networks, showing that common methods produce incorrect attributions when inputs are shifted by a constant, despite no effect on model predictions.

Saliency methods aim to explain the predictions of deep neural networks. These methods lack reliability when the explanation is sensitive to factors that do not contribute to the model prediction. We use a simple and common pre-processing step ---adding a constant shift to the input data--- to show that a transformation with no effect on the model can cause numerous methods to incorrectly attribute. In order to guarantee reliability, we posit that methods should fulfill input invariance, the requirement that a saliency method mirror the sensitivity of the model with respect to transformations of the input. We show, through several examples, that saliency methods that do not satisfy input invariance result in misleading attribution.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes