LGCVNEDec 24, 2014

Multiple Object Recognition with Visual Attention

arXiv:1412.7755v2329 citations
Originality Highly original
AI Analysis

This addresses the challenge of multi-object recognition in computer vision, offering a more efficient and accurate solution for tasks like digit transcription in street view images.

The paper tackles the problem of recognizing multiple objects in images by introducing an attention-based deep recurrent neural network trained with reinforcement learning, achieving higher accuracy than state-of-the-art convolutional networks on Google Street View house number transcription with fewer parameters and less computation.

We present an attention-based model for recognizing multiple objects in images. The proposed model is a deep recurrent neural network trained with reinforcement learning to attend to the most relevant regions of the input image. We show that the model learns to both localize and recognize multiple objects despite being given only class labels during training. We evaluate the model on the challenging task of transcribing house number sequences from Google Street View images and show that it is both more accurate than the state-of-the-art convolutional networks and uses fewer parameters and less computation.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes