CV CLDec 7, 2015

Simple Baseline for Visual Question Answering

Bolei Zhou, Yuandong Tian, Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus

arXiv:1512.02167v232.7339 citationsh-index: 76Has Code

Originality Synthesis-oriented

AI Analysis

This work provides a simple, incremental baseline for researchers in visual question answering, highlighting the effectiveness of basic methods.

The authors tackled visual question answering by proposing a simple bag-of-words baseline that combines word and CNN features, achieving comparable performance to recent RNN-based methods on the VQA dataset.

We describe a very simple bag-of-words baseline for visual question answering. This baseline concatenates the word features from the question and CNN features from the image to predict the answer. When evaluated on the challenging VQA dataset [2], it shows comparable performance to many recent approaches using recurrent neural networks. To explore the strength and weakness of the trained model, we also provide an interactive web demo and open-source code. .

View on arXiv PDF Code

Similar