Training Self-Supervised Depth Completion Using Sparse Measurements and a Single Image
This addresses the challenge of costly annotations and multi-frame dependencies in depth completion for applications like robotics and autonomous driving, offering an incremental improvement over existing self-supervised approaches.
The paper tackles the problem of training depth completion models without dense depth labels or multiple images, proposing a self-supervised method that uses only sparse depth measurements and a single image, achieving competitive performance on benchmarks like KITTI and NYU Depth V2.
Depth completion is an important vision task, and many efforts have been made to enhance the quality of depth maps from sparse depth measurements. Despite significant advances, training these models to recover dense depth from sparse measurements remains a challenging problem. Supervised learning methods rely on dense depth labels to predict unobserved regions, while self-supervised approaches require image sequences to enforce geometric constraints and photometric consistency between frames. However, acquiring dense annotations is costly, and multi-frame dependencies limit the applicability of self-supervised methods in static or single-frame scenarios. To address these challenges, we propose a novel self-supervised depth completion paradigm that requires only sparse depth measurements and their corresponding image for training. Unlike existing methods, our approach eliminates the need for dense depth labels or additional images captured from neighboring viewpoints. By leveraging the characteristics of depth distribution, we design novel loss functions that effectively propagate depth information from observed points to unobserved regions. Additionally, we incorporate segmentation maps generated by vision foundation models to further enhance depth estimation. Extensive experiments demonstrate the effectiveness of our proposed method.