CVAug 17, 2020

Self-Supervised Learning for Monocular Depth Estimation from Aerial Imagery

arXiv:2008.07246v115 citations
AI Analysis

This addresses the problem of acquiring ground truth data for aerial depth estimation, offering a self-supervised solution for researchers and practitioners in remote sensing or computer vision, but it is incremental as it builds on existing self-supervised techniques.

The paper tackles monocular depth estimation from aerial imagery without annotated data by using self-supervised learning from image sequences, achieving up to 93.5% accuracy on δ1.25 metric. It shows the method is suitable for initialization or use in challenging regions like occluded areas, though results are inferior to conventional methods.

Supervised learning based methods for monocular depth estimation usually require large amounts of extensively annotated training data. In the case of aerial imagery, this ground truth is particularly difficult to acquire. Therefore, in this paper, we present a method for self-supervised learning for monocular depth estimation from aerial imagery that does not require annotated training data. For this, we only use an image sequence from a single moving camera and learn to simultaneously estimate depth and pose information. By sharing the weights between pose and depth estimation, we achieve a relatively small model, which favors real-time application. We evaluate our approach on three diverse datasets and compare the results to conventional methods that estimate depth maps based on multi-view geometry. We achieve an accuracy δ1.25 of up to 93.5 %. In addition, we have paid particular attention to the generalization of a trained model to unknown data and the self-improving capabilities of our approach. We conclude that, even though the results of monocular depth estimation are inferior to those achieved by conventional methods, they are well suited to provide a good initialization for methods that rely on image matching or to provide estimates in regions where image matching fails, e.g. occluded or texture-less regions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes