Assessing the Noise Robustness of Class Activation Maps: A Framework for Reliable Model Interpretability
This work addresses the reliability of interpretability methods for deep learning models, which is crucial for users in fields like healthcare and autonomous systems, though it is incremental as it builds on existing CAM techniques.
The paper tackles the problem of evaluating the noise robustness of Class Activation Maps (CAMs) for model interpretability, finding considerable variability in sensitivity across methods and proposing a new robustness metric based on consistency and responsiveness.
Class Activation Maps (CAMs) are one of the important methods for visualizing regions used by deep learning models. Yet their robustness to different noise remains underexplored. In this work, we evaluate and report the resilience of various CAM methods for different noise perturbations across multiple architectures and datasets. By analyzing the influence of different noise types on CAM explanations, we assess the susceptibility to noise and the extent to which dataset characteristics may impact explanation stability. The findings highlight considerable variability in noise sensitivity for various CAMs. We propose a robustness metric for CAMs that captures two key properties: consistency and responsiveness. Consistency reflects the ability of CAMs to remain stable under input perturbations that do not alter the predicted class, while responsiveness measures the sensitivity of CAMs to changes in the prediction caused by such perturbations. The metric is evaluated empirically across models, different perturbations, and datasets along with complementary statistical tests to exemplify the applicability of our proposed approach.