CLMar 28, 2021

'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) Tasks

arXiv:2103.15022v211 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses a bottleneck in evaluating open-ended VQA tasks for researchers and developers, though it is incremental as it builds on existing datasets and methods.

The paper tackles the problem of visual question answering (VQA) datasets assuming only one ground-truth answer per question, which leads to models being penalized for semantically correct but non-matching answers. They propose Alternative Answer Sets (AAS) created automatically with NLP tools, introduce a semantic metric based on AAS, and show performance improvements on the GQA dataset.

GQA~\citep{hudson2019gqa} is a dataset for real-world visual reasoning and compositional question answering. We found that many answers predicted by the best vision-language models on the GQA dataset do not match the ground-truth answer but still are semantically meaningful and correct in the given context. In fact, this is the case with most existing visual question answering (VQA) datasets where they assume only one ground-truth answer for each question. We propose Alternative Answer Sets (AAS) of ground-truth answers to address this limitation, which is created automatically using off-the-shelf NLP tools. We introduce a semantic metric based on AAS and modify top VQA solvers to support multiple plausible answers for a question. We implement this approach on the GQA dataset and show the performance improvements. Code and data are available in this link \url{https://github.com/luomancs/alternative_answer_set.git}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes