CVNov 11, 2020

Learned Equivariant Rendering without Transformation Supervision

arXiv:2011.05787v1
AI Analysis

This addresses the challenge of unsupervised object discovery and scene manipulation for computer vision applications, but appears incremental as it builds on existing equivariance and self-supervised learning ideas.

The paper tackles the problem of learning scene representations from video without supervision, automatically delineating objects and background by leveraging equivariance to transformations, and demonstrates real-time manipulation and rendering of unseen combinations on moving MNIST with backgrounds.

We propose a self-supervised framework to learn scene representations from video that are automatically delineated into objects and background. Our method relies on moving objects being equivariant with respect to their transformation across frames and the background being constant. After training, we can manipulate and render the scenes in real time to create unseen combinations of objects, transformations, and backgrounds. We show results on moving MNIST with backgrounds.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes