CV LGMar 28, 2016

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models

S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, David Szepesvari, Koray Kavukcuoglu, Geoffrey E. Hinton

arXiv:1603.08575v336.4581 citations

Originality Highly original

AI Analysis

This addresses the challenge of efficient and unsupervised object recognition in complex scenes for computer vision applications, representing a novel method rather than an incremental improvement.

The paper tackles the problem of unsupervised scene understanding by introducing a recurrent neural network framework that learns to attend to and process objects sequentially, achieving accurate object counting, localization, and classification in 2D and 3D scenes without supervision.

We present a framework for efficient inference in structured image models that explicitly reason about objects. We achieve this by performing probabilistic inference using a recurrent neural network that attends to scene elements and processes them one at a time. Crucially, the model itself learns to choose the appropriate number of inference steps. We use this scheme to learn to perform inference in partially specified 2D models (variable-sized variational auto-encoders) and fully specified 3D models (probabilistic renderers). We show that such models learn to identify multiple objects - counting, locating and classifying the elements of a scene - without any supervision, e.g., decomposing 3D images with various numbers of objects in a single forward pass of a neural network. We further show that the networks produce accurate inferences when compared to supervised counterparts, and that their structure leads to improved generalization.

View on arXiv PDF

Similar