CVNov 23, 2015

Where To Look: Focus Regions for Visual Question Answering

Kevin J. Shih, Saurabh Singh, Derek Hoiem

arXiv:1511.07394v235.5484 citations

Originality Incremental advance

AI Analysis

This addresses the problem of improving accuracy in visual question answering for AI systems, though it appears incremental as it builds on existing region selection methods.

The paper tackles visual question answering by learning to select relevant image regions for text-based queries, achieving significant improvements on specific question types like 'what color' and 'what room'.

We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query. Our method exhibits significant improvements in answering questions such as "what color," where it is necessary to evaluate a specific location, and "what room," where it selectively identifies informative image regions. Our model is tested on the VQA dataset which is the largest human-annotated visual question answering dataset to our knowledge.

View on arXiv PDF

Similar