CV GRAug 2, 2021

Consistent Depth of Moving Objects in Video

Zhoutong Zhang, Forrester Cole, Richard Tucker, William T. Freeman, Tali Dekel

arXiv:2108.01166v181 citations

AI Analysis

This addresses the underconstrained problem of depth estimation in dynamic scenes for applications like video editing, though it is incremental as it builds on existing depth-prediction methods with a new test-time training framework.

The paper tackles the problem of estimating geometrically and temporally consistent depth for dynamic scenes with moving objects from a single moving-camera video, achieving accurate and coherent results on challenging videos.

We present a method to estimate depth of a dynamic scene, containing arbitrary moving objects, from an ordinary video captured with a moving camera. We seek a geometrically and temporally consistent solution to this underconstrained problem: the depth predictions of corresponding points across frames should induce plausible, smooth motion in 3D. We formulate this objective in a new test-time training framework where a depth-prediction CNN is trained in tandem with an auxiliary scene-flow prediction MLP over the entire input video. By recursively unrolling the scene-flow prediction MLP over varying time steps, we compute both short-range scene flow to impose local smooth motion priors directly in 3D, and long-range scene flow to impose multi-view consistency constraints with wide baselines. We demonstrate accurate and temporally coherent results on a variety of challenging videos containing diverse moving objects (pets, people, cars), as well as camera motion. Our depth maps give rise to a number of depth-and-motion aware video editing effects such as object and lighting insertion.

View on arXiv PDF

Similar