CLCVSep 8, 2018

Faithful Multimodal Explanation for Visual Question Answering

arXiv:1809.02805v21130 citations
Originality Incremental advance
AI Analysis

This addresses the need for trustworthy and comprehensible AI explanations in VQA, though it appears incremental as it builds on existing methods to enhance explanatory capability.

The paper tackles the problem of opaque black-box AI systems in visual question answering by developing a VQA system that provides integrated textual and visual explanations, demonstrating advantages over competing methods through extensive evaluation with automatic and human metrics.

AI systems' ability to explain their reasoning is critical to their utility and trustworthiness. Deep neural networks have enabled significant progress on many challenging problems such as visual question answering (VQA). However, most of them are opaque black boxes with limited explanatory capability. This paper presents a novel approach to developing a high-performing VQA system that can elucidate its answers with integrated textual and visual explanations that faithfully reflect important aspects of its underlying reasoning while capturing the style of comprehensible human explanations. Extensive experimental evaluation demonstrates the advantages of this approach compared to competing methods with both automatic evaluation metrics and human evaluation metrics.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes