LG AIApr 18, 2025

Probabilistic Stability Guarantees for Feature Attributions

Helen Jin, Anton Xue, Weiqiu You, Surbhi Goel, Eric Wong

arXiv:2504.13787v312 citationsh-index: 6

Originality Highly original

AI Analysis

This work addresses the need for reliable and efficient stability certification in explanation methods for machine learning models, offering a practical solution for researchers and practitioners in interpretable AI.

The paper tackled the problem of providing stability guarantees for feature attributions in machine learning, introducing a model-agnostic certification algorithm that yields non-trivial and interpretable guarantees while achieving a more favorable trade-off between accuracy and stability compared to prior methods.

Stability guarantees have emerged as a principled way to evaluate feature attributions, but existing certification methods rely on heavily smoothed classifiers and often produce conservative guarantees. To address these limitations, we introduce soft stability and propose a simple, model-agnostic, sample-efficient stability certification algorithm (SCA) that yields non-trivial and interpretable guarantees for any attribution method. Moreover, we show that mild smoothing achieves a more favorable trade-off between accuracy and stability, avoiding the aggressive compromises made in prior certification methods. To explain this behavior, we use Boolean function analysis to derive a novel characterization of stability under smoothing. We evaluate SCA on vision and language tasks and demonstrate the effectiveness of soft stability in measuring the robustness of explanation methods.

View on arXiv PDF

Similar