CVCLJul 8, 2020

IQ-VQA: Intelligent Visual Question Answering

arXiv:2007.04422v17 citations
AI Analysis

This work addresses the issue of unreliable VQA models for applications requiring consistent and robust AI systems, representing an incremental improvement with a novel framework.

The paper tackles the problem of inconsistency and brittleness in Visual Question Answering models by proposing a model-independent cyclic framework that improves consistency by ~15% on a rule-based dataset and ~7% on a new human-annotated dataset, while also enhancing robustness by ~2% without degrading performance.

Even though there has been tremendous progress in the field of Visual Question Answering, models today still tend to be inconsistent and brittle. To this end, we propose a model-independent cyclic framework which increases consistency and robustness of any VQA architecture. We train our models to answer the original question, generate an implication based on the answer and then also learn to answer the generated implication correctly. As a part of the cyclic framework, we propose a novel implication generator which can generate implied questions from any question-answer pair. As a baseline for future works on consistency, we provide a new human annotated VQA-Implications dataset. The dataset consists of ~30k questions containing implications of 3 types - Logical Equivalence, Necessary Condition and Mutual Exclusion - made from the VQA v2.0 validation dataset. We show that our framework improves consistency of VQA models by ~15% on the rule-based dataset, ~7% on VQA-Implications dataset and robustness by ~2%, without degrading their performance. In addition, we also quantitatively show improvement in attention maps which highlights better multi-modal understanding of vision and language.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes