LGAICVJun 18, 2025

Pixel-level Certified Explanations via Randomized Smoothing

arXiv:2506.15499v15 citationsh-index: 137Has CodeICML
Originality Incremental advance
AI Analysis

This addresses the vulnerability of attribution methods to adversarial perturbations, which undermines trust in AI explanations, though it is incremental as it builds on existing randomized smoothing techniques.

The paper tackles the problem of non-robust pixel-level explanations in deep learning by introducing a certification framework that guarantees robustness for any black-box attribution method using randomized smoothing, achieving robust, interpretable, and faithful attributions across 12 methods and 5 ImageNet models.

Post-hoc attribution methods aim to explain deep learning predictions by highlighting influential input pixels. However, these explanations are highly non-robust: small, imperceptible input perturbations can drastically alter the attribution map while maintaining the same prediction. This vulnerability undermines their trustworthiness and calls for rigorous robustness guarantees of pixel-level attribution scores. We introduce the first certification framework that guarantees pixel-level robustness for any black-box attribution method using randomized smoothing. By sparsifying and smoothing attribution maps, we reformulate the task as a segmentation problem and certify each pixel's importance against $\ell_2$-bounded perturbations. We further propose three evaluation metrics to assess certified robustness, localization, and faithfulness. An extensive evaluation of 12 attribution methods across 5 ImageNet models shows that our certified attributions are robust, interpretable, and faithful, enabling reliable use in downstream tasks. Our code is at https://github.com/AlaaAnani/certified-attributions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes