CVLGMLDec 12, 2013

Unsupervised learning of depth and motion

arXiv:1312.3429v241 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of 3-D scene understanding for computer vision applications, though it appears incremental as it builds on biologically inspired units and existing learning frameworks.

The paper tackles the problem of jointly estimating depth and motion from visual data by learning interrelations between images from multiple cameras or video frames. The result shows state-of-the-art performance in 3-D activity analysis and significantly outperforms existing hand-engineered 3-D motion features.

We present a model for the joint estimation of disparity and motion. The model is based on learning about the interrelations between images from multiple cameras, multiple frames in a video, or the combination of both. We show that learning depth and motion cues, as well as their combinations, from data is possible within a single type of architecture and a single type of learning algorithm, by using biologically inspired "complex cell" like units, which encode correlations between the pixels across image pairs. Our experimental results show that the learning of depth and motion makes it possible to achieve state-of-the-art performance in 3-D activity analysis, and to outperform existing hand-engineered 3-D motion features by a very large margin.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes