CVAICLMay 5, 2015

Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

arXiv:1505.01121v3635 citations
Originality Incremental advance
AI Analysis

This work addresses a multi-modal question answering task for computer vision and NLP, representing an incremental improvement over prior methods.

The paper tackles the problem of answering questions about real-world images by proposing Neural-Image-QA, an end-to-end neural-based approach that conditions answers on both visual and language inputs, doubling the performance of the previous best method on the DAQUAR dataset.

We address a question answering task on real-world images that is set up as a Visual Turing Test. By combining latest advances in image representation and natural language processing, we propose Neural-Image-QA, an end-to-end formulation to this problem for which all parts are trained jointly. In contrast to previous efforts, we are facing a multi-modal problem where the language output (answer) is conditioned on visual and natural language input (image and question). Our approach Neural-Image-QA doubles the performance of the previous best approach on this problem. We provide additional insights into the problem by analyzing how much information is contained only in the language part for which we provide a new human baseline. To study human consensus, which is related to the ambiguities inherent in this challenging task, we propose two novel metrics and collect additional answers which extends the original DAQUAR dataset to DAQUAR-Consensus.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes