CVCLDec 7, 2015

Simple Baseline for Visual Question Answering

arXiv:1512.02167v2339 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work provides a simple, incremental baseline for researchers in visual question answering, highlighting the effectiveness of basic methods.

The authors tackled visual question answering by proposing a simple bag-of-words baseline that combines word and CNN features, achieving comparable performance to recent RNN-based methods on the VQA dataset.

We describe a very simple bag-of-words baseline for visual question answering. This baseline concatenates the word features from the question and CNN features from the image to predict the answer. When evaluated on the challenging VQA dataset [2], it shows comparable performance to many recent approaches using recurrent neural networks. To explore the strength and weakness of the trained model, we also provide an interactive web demo and open-source code. .

Code Implementations7 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes