CVMar 16, 2023

Logical Implications for Visual Question Answering Consistency

arXiv:2303.09427v111 citationsh-index: 35
AI Analysis

This addresses the issue of unreliable reasoning in VQA systems for AI applications, though it is incremental as it builds on existing models with a new loss function.

The paper tackles the problem of inconsistent answers in Visual Question Answering (VQA) models by proposing a novel consistency loss term that directly reduces logical inconsistencies, showing improvements on state-of-the-art VQA models across datasets like VQA Introspect and DME.

Despite considerable recent progress in Visual Question Answering (VQA) models, inconsistent or contradictory answers continue to cast doubt on their true reasoning capabilities. However, most proposed methods use indirect strategies or strong assumptions on pairs of questions and answers to enforce model consistency. Instead, we propose a novel strategy intended to improve model performance by directly reducing logical inconsistencies. To do this, we introduce a new consistency loss term that can be used by a wide range of the VQA models and which relies on knowing the logical relation between pairs of questions and answers. While such information is typically not available in VQA datasets, we propose to infer these logical relations using a dedicated language model and use these in our proposed consistency loss function. We conduct extensive experiments on the VQA Introspect and DME datasets and show that our method brings improvements to state-of-the-art VQA models, while being robust across different architectures and settings.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes