CVLGMLOct 13, 2021

Unsupervised Object Learning via Common Fate

arXiv:2110.06562v230 citations
Originality Incremental advance
AI Analysis

This addresses the problem of unsupervised object learning for causal scene modeling, representing an incremental advance with a new dataset.

The paper tackles learning generative object models from unlabeled videos by decomposing it into subtasks using motion segmentation and a conditional scene model, achieving generalization beyond training occlusions and enabling sampling of novel scene configurations.

Learning generative object models from unlabelled videos is a long standing problem and required for causal scene modeling. We decompose this problem into three easier subtasks, and provide candidate solutions for each of them. Inspired by the Common Fate Principle of Gestalt Psychology, we first extract (noisy) masks of moving objects via unsupervised motion segmentation. Second, generative models are trained on the masks of the background and the moving objects, respectively. Third, background and foreground models are combined in a conditional "dead leaves" scene model to sample novel scene configurations where occlusions and depth layering arise naturally. To evaluate the individual stages, we introduce the Fishbowl dataset positioned between complex real-world scenes and common object-centric benchmarks of simplistic objects. We show that our approach allows learning generative models that generalize beyond the occlusions present in the input videos, and represent scenes in a modular fashion that allows sampling plausible scenes outside the training distribution by permitting, for instance, object numbers or densities not observed in the training set.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes