CVAILGMay 2, 2021

A survey on VQA_Datasets and Approaches

arXiv:2105.00421v122 citations
Originality Synthesis-oriented
AI Analysis

It provides a survey for researchers in VQA, but it is incremental as it summarizes existing work without new contributions.

This paper reviews and analyzes existing datasets, metrics, and models for the visual question answering (VQA) task, which combines computer vision and natural language processing to answer text-based questions from visual information.

Visual question answering (VQA) is a task that combines both the techniques of computer vision and natural language processing. It requires models to answer a text-based question according to the information contained in a visual. In recent years, the research field of VQA has been expanded. Research that focuses on the VQA, examining the reasoning ability and VQA on scientific diagrams, has also been explored more. Meanwhile, more multimodal feature fusion mechanisms have been proposed. This paper will review and analyze existing datasets, metrics, and models proposed for the VQA task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes