LGCVMay 16

AIM: Adversarial Information Masking for Faithfulness Evaluation of Saliency Maps

arXiv:2605.1690514.3
AI Analysis

Provides a more reliable evaluation framework for saliency map faithfulness, addressing a known bottleneck in interpretability research.

Existing faithfulness evaluations of saliency maps are confounded by masking operators (e.g., zero masking creates out-of-distribution artifacts, interpolation masking preserves residual information). AIM replaces features with adversarial values and compares degradation under complementary orders, reducing masking-induced bias across image, audio, and EEG tasks.

Post-hoc saliency methods are widely used to interpret deep neural networks, but their faithfulness is difficult to evaluate reliably. Existing evaluations mask features according to saliency-induced feature ordering and measure performance degradation, but this degradation can be confounded by the masking operator: zero masking may create out-of-distribution artifacts, while interpolation-based masking may preserve residual predictive information. We propose Adversarial Information Masking (AIM), a saliency-guided adversarial feature replacement framework for evaluating both saliency-map faithfulness and masking-operator reliability. AIM replaces selected features with values from an adversarial counterpart of the input and compares degradation under complementary masking orders. We assess reliability using random-attribution bias and stability of explanation-method faithfulness rankings. Experiments on image, audio, and EEG tasks suggest that AIM reduces masking-induced bias compared with zero and interpolation-based masking, while revealing modality-dependent differences between signed and unsigned attributions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes