CVSep 26, 2023

GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for Indoor Scenes

arXiv:2309.16019v127 citationsh-index: 43Has Code
Originality Incremental advance
AI Analysis

It addresses depth estimation challenges in indoor environments, which is incremental as it builds on existing self-supervised methods with specific improvements.

This paper tackles self-supervised monocular depth estimation in indoor scenes by refining coarse camera poses from geometry and using vision transformers with self-distillation to handle low texture, achieving new state-of-the-art results on datasets like NYUv2 and ScanNet.

This paper tackles the challenges of self-supervised monocular depth estimation in indoor scenes caused by large rotation between frames and low texture. We ease the learning process by obtaining coarse camera poses from monocular sequences through multi-view geometry to deal with the former. However, we found that limited by the scale ambiguity across different scenes in the training dataset, a naïve introduction of geometric coarse poses cannot play a positive role in performance improvement, which is counter-intuitive. To address this problem, we propose to refine those poses during training through rotation and translation/scale optimization. To soften the effect of the low texture, we combine the global reasoning of vision transformers with an overfitting-aware, iterative self-distillation mechanism, providing more accurate depth guidance coming from the network itself. Experiments on NYUv2, ScanNet, 7scenes, and KITTI datasets support the effectiveness of each component in our framework, which sets a new state-of-the-art for indoor self-supervised monocular depth estimation, as well as outstanding generalization ability. Code and models are available at https://github.com/zxcqlf/GasMono

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes