Enhancing Interpretability for Vision Models via Shapley Value Optimization
This addresses the need for more faithful and compatible interpretability methods in vision models, representing an incremental improvement over existing approaches.
The paper tackled the problem of opaque decision-making in deep neural networks by proposing a self-explaining framework that integrates Shapley value optimization during training, resulting in state-of-the-art interpretability with preserved model performance.
Deep neural networks have demonstrated remarkable performance across various domains, yet their decision-making processes remain opaque. Although many explanation methods are dedicated to bringing the obscurity of DNNs to light, they exhibit significant limitations: post-hoc explanation methods often struggle to faithfully reflect model behaviors, while self-explaining neural networks sacrifice performance and compatibility due to their specialized architectural designs. To address these challenges, we propose a novel self-explaining framework that integrates Shapley value estimation as an auxiliary task during training, which achieves two key advancements: 1) a fair allocation of the model prediction scores to image patches, ensuring explanations inherently align with the model's decision logic, and 2) enhanced interpretability with minor structural modifications, preserving model performance and compatibility. Extensive experiments on multiple benchmarks demonstrate that our method achieves state-of-the-art interpretability.