CVDec 10, 2020

Robust Consistent Video Depth Estimation

arXiv:2012.05901v2255 citations
AI Analysis

This work provides a robust solution for accurate depth and pose estimation from noisy monocular video, which is beneficial for applications like AR/VR and robotics, representing an incremental improvement over existing methods.

This paper introduces an algorithm for estimating consistent dense depth maps and camera poses from monocular video. It combines a learning-based depth prior with geometric optimization, outperforming state-of-the-art methods on the Sintel benchmark for both depth and pose estimations.

We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video. We integrate a learning-based depth prior, in the form of a convolutional neural network trained for single-image depth estimation, with geometric optimization, to estimate a smooth camera trajectory as well as detailed and stable depth reconstruction. Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details. In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations. Our method quantitatively outperforms state-of-the-arts on the Sintel benchmark for both depth and pose estimations and attains favorable qualitative results across diverse wild datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes