Self-Supervised Monocular Image Depth Learning and Confidence Estimation
This addresses the challenge of limited annotated data for computer vision tasks like depth estimation, offering a practical solution for applications such as autonomous driving, though it is incremental in nature.
The paper tackles the problem of depth estimation from monocular images without ground truth annotations by proposing a self-supervised framework that includes confidence estimation, resulting in state-of-the-art performance on the KITTI dataset.
Convolutional Neural Networks (CNNs) need large amounts of data with ground truth annotation, which is a challenging problem that has limited the development and fast deployment of CNNs for many computer vision tasks. We propose a novel framework for depth estimation from monocular images with corresponding confidence in a self-supervised manner. A fully differential patch-based cost function is proposed by using the Zero-Mean Normalized Cross Correlation (ZNCC) that takes multi-scale patches as a matching strategy. This approach greatly increases the accuracy and robustness of the depth learning. In addition, the proposed patch-based cost function can provide a 0 to 1 confidence, which is then used to supervise the training of a parallel network for confidence map learning and estimation. Evaluation on KITTI dataset shows that our method outperforms the state-of-the-art results.