CVNov 19, 2015

Structured Depth Prediction in Challenging Monocular Video Sequences

arXiv:1511.06070v11 citations
Originality Incremental advance
AI Analysis

It addresses depth prediction for computer vision applications in dynamic environments where traditional methods fail, representing an incremental improvement.

The paper tackles depth estimation from monocular video in challenging scenarios like non-translational camera motion and dynamic scenes, achieving effective results in both indoor and outdoor settings.

In this paper, we tackle the problem of estimating the depth of a scene from a monocular video sequence. In particular, we handle challenging scenarios, such as non-translational camera motion and dynamic scenes, where traditional structure from motion and motion stereo methods do not apply. To this end, we first study the problem of depth estimation from a single image. In this context, we exploit the availability of a pool of images for which the depth is known, and formulate monocular depth estimation as a discrete-continuous optimization problem, where the continuous variables encode the depth of the superpixels in the input image, and the discrete ones represent relationships between neighboring superpixels. The solution to this discrete-continuous optimization problem is obtained by performing inference in a graphical model using particle belief propagation. To handle video sequences, we then extend our single image model to a two-frame one that naturally encodes short-range temporal consistency and inherently handles dynamic objects. Based on the prediction of this model, we then introduce a fully-connected pairwise CRF that accounts for longer range spatio-temporal interactions throughout a video. We demonstrate the effectiveness of our model in both the indoor and outdoor scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes