A Rate-Distortion Framework for Explaining Black-box Model Decisions
This addresses the need for interpretability in AI for users dealing with complex models, though it appears incremental as it builds on existing perturbation-based explanation methods.
The paper tackles the problem of explaining black-box model decisions by introducing the Rate-Distortion Explanation (RDE) framework, a mathematically well-founded method based on input perturbations applicable to any differentiable pre-trained model, with experiments showing adaptability across images, audio, and urban simulations.
We present the Rate-Distortion Explanation (RDE) framework, a mathematically well-founded method for explaining black-box model decisions. The framework is based on perturbations of the target input signal and applies to any differentiable pre-trained model such as neural networks. Our experiments demonstrate the framework's adaptability to diverse data modalities, particularly images, audio, and physical simulations of urban environments.