AICLCVNEMLFeb 1, 2018

Dual Recurrent Attention Units for Visual Question Answering

arXiv:1802.00209v333 citations
Originality Incremental advance
AI Analysis

This work addresses the need for robust attention mechanisms in VQA models, which is crucial for AI systems that process both vision and text, but it is incremental as it builds on existing attention-based methods.

The paper tackles the problem of improving attention mechanisms in Visual Question Answering (VQA) by proposing a recurrent attention mechanism, showing that it outperforms traditional convolutional approaches and achieves competitive results, including outperforming the first-place winner on the VQA 2016 challenge and improving upon the VQA 2017 winner.

Visual Question Answering (VQA) requires AI models to comprehend data in two domains, vision and text. Current state-of-the-art models use learned attention mechanisms to extract relevant information from the input domains to answer a certain question. Thus, robust attention mechanisms are essential for powerful VQA models. In this paper, we propose a recurrent attention mechanism and show its benefits compared to the traditional convolutional approach. We perform two ablation studies to evaluate recurrent attention. First, we introduce a baseline VQA model with visual attention and test the performance difference between convolutional and recurrent attention on the VQA 2.0 dataset. Secondly, we design an architecture for VQA which utilizes dual (textual and visual) Recurrent Attention Units (RAUs). Using this model, we show the effect of all possible combinations of recurrent and convolutional dual attention. Our single model outperforms the first place winner on the VQA 2016 challenge and to the best of our knowledge, it is the second best performing single model on the VQA 1.0 dataset. Furthermore, our model noticeably improves upon the winner of the VQA 2017 challenge. Moreover, we experiment replacing attention mechanisms in state-of-the-art models with our RAUs and show increased performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes