CASE: Contrastive Activation for Saliency Estimation
This work addresses a critical reliability issue in saliency methods for model interpretability, which is important for researchers and practitioners in AI, though it is incremental as it builds on existing diagnostic frameworks.
The authors tackled the problem of class insensitivity in saliency methods, showing that many widely used methods produce nearly identical explanations for different class labels, and introduced CASE, which produces more faithful and class-specific explanations as demonstrated through diagnostic and fidelity tests.
Saliency methods are widely used to visualize which input features are deemed relevant to a model's prediction. However, their visual plausibility can obscure critical limitations. In this work, we propose a diagnostic test for class sensitivity: a method's ability to distinguish between competing class labels on the same input. Through extensive experiments, we show that many widely used saliency methods produce nearly identical explanations regardless of the class label, calling into question their reliability. We find that class-insensitive behavior persists across architectures and datasets, suggesting the failure mode is structural rather than model-specific. Motivated by these findings, we introduce CASE, a contrastive explanation method that isolates features uniquely discriminative for the predicted class. We evaluate CASE using the proposed diagnostic and a perturbation-based fidelity test, and show that it produces faithful and more class-specific explanations than existing methods.