CVApr 25, 2019

Learning the Depths of Moving People by Watching Frozen People

arXiv:1904.11111v1271 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of depth estimation for dynamic, non-rigid objects in monocular video, which is incremental as it builds on existing data-driven approaches but introduces a novel training data source.

The paper tackles the problem of predicting dense depth for freely moving people and cameras from monocular video by learning human depth priors from Internet videos of people freezing in poses, achieving improvement over state-of-the-art methods on real-world sequences.

We present a method for predicting dense depth in scenarios where both a monocular camera and people in the scene are freely moving. Existing methods for recovering depth for dynamic, non-rigid objects from monocular video impose strong assumptions on the objects' motion and may only recover sparse depth. In this paper, we take a data-driven approach and learn human depth priors from a new source of data: thousands of Internet videos of people imitating mannequins, i.e., freezing in diverse, natural poses, while a hand-held camera tours the scene. Because people are stationary, training data can be generated using multi-view stereo reconstruction. At inference time, our method uses motion parallax cues from the static areas of the scenes to guide the depth prediction. We demonstrate our method on real-world sequences of complex human actions captured by a moving hand-held camera, show improvement over state-of-the-art monocular depth prediction methods, and show various 3D effects produced using our predicted depth.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes