LGAug 18, 2023

On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box

arXiv:2308.09381v35.34 citationsh-index: 25Has Code

Originality Incremental advance

AI Analysis

This addresses the need for safe and flexible explanations in AI, particularly for image data, though it is incremental as it builds on existing gradient-based attribution methods.

The paper tackles the problem of explaining black-box deep learning models by proposing a gradient-estimation-based method that only requires query-level access, achieving competitive performance with white-box methods.

Attribution methods shed light on the explainability of data-driven approaches such as deep learning models by uncovering the most influential features in a to-be-explained decision. While determining feature attributions via gradients delivers promising results, the internal access required for acquiring gradients can be impractical under safety concerns, thus limiting the applicability of gradient-based approaches. In response to such limited flexibility, this paper presents \methodAbr~(gradient-estimation-based explanation), an approach that produces gradient-like explanations through only query-level access. The proposed approach holds a set of fundamental properties for attribution methods, which are mathematically rigorously proved, ensuring the quality of its explanations. In addition to the theoretical analysis, with a focus on image data, the experimental results empirically demonstrate the superiority of the proposed method over state-of-the-art black-box methods and its competitive performance compared to methods with full access.

View on arXiv PDF Code

Similar