CVCLMay 24, 2019

Self-Critical Reasoning for Robust Visual Question Answering

arXiv:1905.09998v3174 citations
Originality Incremental advance
AI Analysis

This addresses generalization issues in VQA for AI systems, though it is incremental as it builds on existing methods for reducing language priors.

The paper tackled the problem of visual question answering (VQA) systems relying on superficial statistical correlations by introducing a self-critical training objective to align visual explanations with influential image regions, achieving state-of-the-art results of 49.5% and 48.5% on the VQA-CP dataset.

Visual Question Answering (VQA) deep-learning systems tend to capture superficial statistical correlations in the training data because of strong language priors and fail to generalize to test data with a significantly different question-answer (QA) distribution. To address this issue, we introduce a self-critical training objective that ensures that visual explanations of correct answers match the most influential image regions more than other competitive answer candidates. The influential regions are either determined from human visual/textual explanations or automatically from just significant words in the question and answer. We evaluate our approach on the VQA generalization task using the VQA-CP dataset, achieving a new state-of-the-art i.e., 49.5% using textual explanations and 48.5% using automatically annotated regions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes