Why Uncertainty Calibration Matters for Reliable Perturbation-based Explanations
This addresses the need for more reliable and transparent AI explanations, particularly in domains like computer vision, but is incremental as it builds on existing calibration and explanation methods.
The paper tackles the problem of unreliable perturbation-based explanations in machine learning by showing that poor uncertainty calibration under explainability-specific perturbations directly undermines explanation quality, and introduces ReCalX to recalibrate models, resulting in explanations more aligned with human perception and object locations in experiments on computer vision models.
Perturbation-based explanations are widely utilized to enhance the transparency of modern machine-learning models. However, their reliability is often compromised by the unknown model behavior under the specific perturbations used. This paper investigates the relationship between uncertainty calibration - the alignment of model confidence with actual accuracy - and perturbation-based explanations. We show that models frequently produce unreliable probability estimates when subjected to explainability-specific perturbations and theoretically prove that this directly undermines explanation quality. To address this, we introduce ReCalX, a novel approach to recalibrate models for improved perturbation-based explanations while preserving their original predictions. Experiments on popular computer vision models demonstrate that our calibration strategy produces explanations that are more aligned with human perception and actual object locations.