Occlusion-Aware Depth Estimation with Adaptive Normal Constraints
This addresses the need for more accurate depth estimation in applications like robot navigation and 3D reconstruction, particularly for man-made indoor scenes, though it is incremental as it builds on existing learning-based methods.
The paper tackles the problem of multi-frame depth estimation from color video, where existing methods produce depth maps that fail to preserve geometric features like corners and planes in man-made scenes, and introduces a Combined Normal Map constraint and occlusion-aware strategy to improve accuracy and feature preservation.
We present a new learning-based method for multi-frame depth estimation from a color video, which is a fundamental problem in scene understanding, robot navigation or handheld 3D reconstruction. While recent learning-based methods estimate depth at high accuracy, 3D point clouds exported from their depth maps often fail to preserve important geometric feature (e.g., corners, edges, planes) of man-made scenes. Widely-used pixel-wise depth errors do not specifically penalize inconsistency on these features. These inaccuracies are particularly severe when subsequent depth reconstructions are accumulated in an attempt to scan a full environment with man-made objects with this kind of features. Our depth estimation algorithm therefore introduces a Combined Normal Map (CNM) constraint, which is designed to better preserve high-curvature features and global planar regions. In order to further improve the depth estimation accuracy, we introduce a new occlusion-aware strategy that aggregates initial depth predictions from multiple adjacent views into one final depth map and one occlusion probability map for the current reference view. Our method outperforms the state-of-the-art in terms of depth estimation accuracy, and preserves essential geometric features of man-made indoor scenes much better than other algorithms.