CVLGApr 2, 2024

CAM-Based Methods Can See through Walls

arXiv:2404.01964v24 citationsh-index: 12ECML/PKDD
AI Analysis

This reveals a critical flaw in widely-used interpretability methods that could lead to misinterpretation of model behavior, posing a problem for researchers and practitioners relying on these tools.

The paper demonstrates that CAM-based interpretability methods incorrectly attribute importance scores to image regions the model cannot actually see, both theoretically through analysis of GradCAM on a masked CNN and experimentally by training a VGG-like model constrained to ignore lower image parts while still observing positive scores there.

CAM-based methods are widely-used post-hoc interpretability method that produce a saliency map to explain the decision of an image classification model. The saliency map highlights the important areas of the image relevant to the prediction. In this paper, we show that most of these methods can incorrectly attribute an important score to parts of the image that the model cannot see. We show that this phenomenon occurs both theoretically and experimentally. On the theory side, we analyze the behavior of GradCAM on a simple masked CNN model at initialization. Experimentally, we train a VGG-like model constrained to not use the lower part of the image and nevertheless observe positive scores in the unseen part of the image. This behavior is evaluated quantitatively on two new datasets. We believe that this is problematic, potentially leading to mis-interpretation of the model's behavior.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes