CVNov 27, 2023

Fully Authentic Visual Question Answering Dataset from Online Communities

Chongyan Chen, Mengchen Liu, Noel Codella, Yunsheng Li, Lu Yuan, Danna Gurari

arXiv:2311.15562v48.410 citationsh-index: 22Has Code

Originality Synthesis-oriented

AI Analysis

This dataset addresses the need for more realistic VQA benchmarks by providing authentic data, though it is incremental as it focuses on dataset creation and evaluation rather than new methods.

The authors introduced VQAonline, the first Visual Question Answering dataset sourced entirely from authentic online community forums, with answers averaging 173 words, and evaluated six state-of-the-art models using longer-text metrics to identify their weaknesses and align metrics with human judgments.

Visual Question Answering (VQA) entails answering questions about images. We introduce the first VQA dataset in which all contents originate from an authentic use case. Sourced from online question answering community forums, we call it VQAonline. We characterize this dataset and how it relates to eight mainstream VQA datasets. Observing that answers in our dataset tend to be much longer (i.e., a mean of 173 words) and so incompatible with standard VQA evaluation metrics, we instead utilize popular metrics for longer text evaluation for evaluating six state-of-the-art VQA models on VQAonline and report where they struggle most. Finally, we analyze which evaluation metrics align best with human judgments. To facilitate future extensions, we publicly-share the dataset at: https://vqaonline.github.io/.

View on arXiv PDF Code

Similar