LGCVMLJun 24, 2014

Recurrent Models of Visual Attention

arXiv:1406.6247v13963 citations
Originality Highly original
AI Analysis

This work addresses the problem of high computational cost in visual processing for researchers and practitioners in computer vision, offering a novel approach that is not incremental.

The authors tackled the computational expense of processing large images with CNNs by introducing a recurrent neural network model that selectively attends to image regions, achieving significant performance improvements over CNN baselines on cluttered image classification tasks and learning to track objects without explicit training.

Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.

Code Implementations20 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes