CVNov 17, 2017

Attentive Explanations: Justifying Decisions and Pointing to the Evidence (Extended Abstract)

Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

arXiv:1711.07373v12.42 citations

Originality Incremental advance

AI Analysis

This work addresses the bottleneck of explainable AI for visual decision systems by providing datasets and a method to enhance model transparency, though it is incremental in building on existing explainability efforts.

The authors tackled the lack of multimodal explanation data by proposing two large-scale datasets (ACT-X and VQA-X) with visual and textual justification annotations for classification and question answering, and introduced a multimodal method that improves both textual justification and evidence localization.

Deep models are the defacto standard in visual decision problems due to their impressive performance on a wide array of visual tasks. On the other hand, their opaqueness has led to a surge of interest in explainable systems. In this work, we emphasize the importance of model explanation in various forms such as visual pointing and textual justification. The lack of data with justification annotations is one of the bottlenecks of generating multimodal explanations. Thus, we propose two large-scale datasets with annotations that visually and textually justify a classification decision for various activities, i.e. ACT-X, and for question answering, i.e. VQA-X. We also introduce a multimodal methodology for generating visual and textual explanations simultaneously. We quantitatively show that training with the textual explanations not only yields better textual justification models, but also models that better localize the evidence that support their decision.

View on arXiv PDF

Similar