CVAICLOct 5, 2016

Visual Question Answering: Datasets, Algorithms, and Future Challenges

arXiv:1610.01465v4265 citations
AI Analysis

It provides a critical overview for researchers in computer vision and NLP, but is incremental as it synthesizes existing work without new results.

This review paper examines the current state of Visual Question Answering (VQA), analyzing existing datasets, algorithms, and evaluation metrics, and identifies limitations in datasets and proposes future research directions.

Visual Question Answering (VQA) is a recent problem in computer vision and natural language processing that has garnered a large amount of interest from the deep learning, computer vision, and natural language processing communities. In VQA, an algorithm needs to answer text-based questions about images. Since the release of the first VQA dataset in 2014, additional datasets have been released and many algorithms have been proposed. In this review, we critically examine the current state of VQA in terms of problem formulation, existing datasets, evaluation metrics, and algorithms. In particular, we discuss the limitations of current datasets with regard to their ability to properly train and assess VQA algorithms. We then exhaustively review existing algorithms for VQA. Finally, we discuss possible future directions for VQA and image understanding research.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes