CVMar 10, 2022

SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning

arXiv:2203.05332v17 citationsh-index: 102
Originality Incremental advance
AI Analysis

This work addresses the scale ambiguity issue in monocular depth estimation for applications like mobile robot navigation, representing an incremental advancement by adapting pre-trained networks with self-supervision.

The paper tackles the problem of monocular depth estimation's scale ambiguity by introducing a self-supervised learning method that uses metric poses from monocular SLAM to enable metrically scaled depth predictions, showing improvements on datasets like EuRoC, OpenLORIS, and ScanNet.

Monocular depth estimation in the wild inherently predicts depth up to an unknown scale. To resolve scale ambiguity issue, we present a learning algorithm that leverages monocular simultaneous localization and mapping (SLAM) with proprioceptive sensors. Such monocular SLAM systems can provide metrically scaled camera poses. Given these metric poses and monocular sequences, we propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation. Our approach is based on a teacher-student formulation which guides our network to predict high-quality depths. We demonstrate that our approach is useful for various applications such as mobile robot navigation and is applicable to diverse environments. Our full system shows improvements over recent self-supervised depth estimation and completion methods on EuRoC, OpenLORIS, and ScanNet datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes