CVMar 20, 2018

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

arXiv:1803.07464v2121 citations
AI Analysis

This addresses the need for more understandable and traceable AI systems in VQA, though it is incremental as it builds on existing VQA frameworks.

The authors tackled the lack of explanations in visual question answering (VQA) by proposing VQA-E, a task that requires models to generate explanations alongside answers, and they showed that this approach improves answer prediction performance, outperforming state-of-the-art methods on the VQA v2 dataset.

Most existing works in visual question answering (VQA) are dedicated to improving the accuracy of predicted answers, while disregarding the explanations. We argue that the explanation for an answer is of the same or even more importance compared with the answer itself, since it makes the question and answering process more understandable and traceable. To this end, we propose a new task of VQA-E (VQA with Explanation), where the computational models are required to generate an explanation with the predicted answer. We first construct a new dataset, and then frame the VQA-E problem in a multi-task learning architecture. Our VQA-E dataset is automatically derived from the VQA v2 dataset by intelligently exploiting the available captions. We have conducted a user study to validate the quality of explanations synthesized by our method. We quantitatively show that the additional supervision from explanations can not only produce insightful textual sentences to justify the answers, but also improve the performance of answer prediction. Our model outperforms the state-of-the-art methods by a clear margin on the VQA v2 dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes