CVLGMar 28, 2016

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models

arXiv:1603.08575v3581 citations
Originality Highly original
AI Analysis

This addresses the challenge of efficient and unsupervised object recognition in complex scenes for computer vision applications, representing a novel method rather than an incremental improvement.

The paper tackles the problem of unsupervised scene understanding by introducing a recurrent neural network framework that learns to attend to and process objects sequentially, achieving accurate object counting, localization, and classification in 2D and 3D scenes without supervision.

We present a framework for efficient inference in structured image models that explicitly reason about objects. We achieve this by performing probabilistic inference using a recurrent neural network that attends to scene elements and processes them one at a time. Crucially, the model itself learns to choose the appropriate number of inference steps. We use this scheme to learn to perform inference in partially specified 2D models (variable-sized variational auto-encoders) and fully specified 3D models (probabilistic renderers). We show that such models learn to identify multiple objects - counting, locating and classifying the elements of a scene - without any supervision, e.g., decomposing 3D images with various numbers of objects in a single forward pass of a neural network. We further show that the networks produce accurate inferences when compared to supervised counterparts, and that their structure leads to improved generalization.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes