CVJun 27, 2022

RES: A Robust Framework for Guiding Visual Explanation

Yuyang Gao, Tong Steven Sun, Guangji Bai, Siyi Gu, Sungsoo Ray Hong, Liang Zhao

arXiv:2206.13413v114.939 citationsh-index: 22Has Code

Originality Incremental advance

AI Analysis

This work addresses the under-explored issue of explanation quality in vision-based AI, offering a solution for researchers and practitioners needing more reliable interpretability, though it is incremental in advancing explanation supervision techniques.

The paper tackles the problem of improving the quality of visual explanations in deep neural networks by addressing challenges like inaccurate boundaries and incomplete regions in human annotations, proposing a RES framework that enhances explanation reasonability and model performance on real-world image datasets.

Despite the fast progress of explanation techniques in modern Deep Neural Networks (DNNs) where the main focus is handling "how to generate the explanations", advanced research questions that examine the quality of the explanation itself (e.g., "whether the explanations are accurate") and improve the explanation quality (e.g., "how to adjust the model to generate more accurate explanations when explanations are inaccurate") are still relatively under-explored. To guide the model toward better explanations, techniques in explanation supervision - which add supervision signals on the model explanation - have started to show promising effects on improving both the generalizability as and intrinsic interpretability of Deep Neural Networks. However, the research on supervising explanations, especially in vision-based applications represented through saliency maps, is in its early stage due to several inherent challenges: 1) inaccuracy of the human explanation annotation boundary, 2) incompleteness of the human explanation annotation region, and 3) inconsistency of the data distribution between human annotation and model explanation maps. To address the challenges, we propose a generic RES framework for guiding visual explanation by developing a novel objective that handles inaccurate boundary, incomplete region, and inconsistent distribution of human annotations, with a theoretical justification on model generalizability. Extensive experiments on two real-world image datasets demonstrate the effectiveness of the proposed framework on enhancing both the reasonability of the explanation and the performance of the backbone DNNs model.

View on arXiv PDF Code

Similar