CVLGIVJul 22, 2020

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

arXiv:2007.11256v137 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of improving depth estimation accuracy for applications in 3D recognition and understanding, but it appears incremental as it builds on existing methods with specific enhancements.

The paper tackled the problem of inaccurate spatial layout and ambiguous boundaries in monocular depth estimation by proposing a structure-aware neural network with spatial attention blocks, a global focal relative loss, and a new Hard Case Depth dataset with an informed learning curriculum. The method outperformed state-of-the-art approaches by a large margin in prediction accuracy on the NYUDv2 dataset and generalization on unseen datasets.

Monocular depth estimation plays a crucial role in 3D recognition and understanding. One key limitation of existing approaches lies in their lack of structural information exploitation, which leads to inaccurate spatial layout, discontinuous surface, and ambiguous boundaries. In this paper, we tackle this problem in three aspects. First, to exploit the spatial relationship of visual features, we propose a structure-aware neural network with spatial attention blocks. These blocks guide the network attention to global structures or local details across different feature layers. Second, we introduce a global focal relative loss for uniform point pairs to enhance spatial constraint in the prediction, and explicitly increase the penalty on errors in depth-wise discontinuous regions, which helps preserve the sharpness of estimation results. Finally, based on analysis of failure cases for prior methods, we collect a new Hard Case (HC) Depth dataset of challenging scenes, such as special lighting conditions, dynamic objects, and tilted camera angles. The new dataset is leveraged by an informed learning curriculum that mixes training examples incrementally to handle diverse data distributions. Experimental results show that our method outperforms state-of-the-art approaches by a large margin in terms of both prediction accuracy on NYUDv2 dataset and generalization performance on unseen datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes