Gradient-free Post-hoc Explainability Using Distillation Aided Learnable Approach
This work addresses the need for post-hoc explainability in AI models with only query access, which is crucial for users and developers dealing with opaque large models, though it appears incremental as it builds on existing distillation and saliency methods.
The paper tackles the problem of explaining black-box AI models without gradient access by proposing a distillation-aided framework (DAX) that generates saliency-based explanations, and it significantly outperforms nine existing methods across image and audio modalities in various evaluations.
The recent advancements in artificial intelligence (AI), with the release of several large models having only query access, make a strong case for explainability of deep models in a post-hoc gradient free manner. In this paper, we propose a framework, named distillation aided explainability (DAX), that attempts to generate a saliency-based explanation in a model agnostic gradient free application. The DAX approach poses the problem of explanation in a learnable setting with a mask generation network and a distillation network. The mask generation network learns to generate the multiplier mask that finds the salient regions of the input, while the student distillation network aims to approximate the local behavior of the black-box model. We propose a joint optimization of the two networks in the DAX framework using the locally perturbed input samples, with the targets derived from input-output access to the black-box model. We extensively evaluate DAX across different modalities (image and audio), in a classification setting, using a diverse set of evaluations (intersection over union with ground truth, deletion based and subjective human evaluation based measures) and benchmark it with respect to $9$ different methods. In these evaluations, the DAX significantly outperforms the existing approaches on all modalities and evaluation metrics.