Hierarchical Normalization for Robust Monocular Depth Estimation
This work addresses the problem of robust depth estimation for computer vision applications, offering an incremental improvement over existing normalization methods.
The paper tackles the problem of monocular depth estimation by addressing the limitations of image-level normalization, which overlooks fine-grained depth differences, and proposes a hierarchical normalization method that improves accuracy by preserving details. The result is a new state-of-the-art performance on five zero-shot transfer benchmark datasets.
In this paper, we address monocular depth estimation with deep neural networks. To enable training of deep monocular estimation models with various sources of datasets, state-of-the-art methods adopt image-level normalization strategies to generate affine-invariant depth representations. However, learning with image-level normalization mainly emphasizes the relations of pixel representations with the global statistic in the images, such as the structure of the scene, while the fine-grained depth difference may be overlooked. In this paper, we propose a novel multi-scale depth normalization method that hierarchically normalizes the depth representations based on spatial information and depth distributions. Compared with previous normalization strategies applied only at the holistic image level, the proposed hierarchical normalization can effectively preserve the fine-grained details and improve accuracy. We present two strategies that define the hierarchical normalization contexts in the depth domain and the spatial domain, respectively. Our extensive experiments show that the proposed normalization strategy remarkably outperforms previous normalization methods, and we set new state-of-the-art on five zero-shot transfer benchmark datasets.