MaskingDepth: Masked Consistency Regularization for Semi-supervised Monocular Depth Estimation
This work addresses the challenge of data scarcity in depth estimation for computer vision applications, offering an incremental improvement through novel regularization techniques.
The paper tackles the problem of reducing reliance on large ground-truth datasets for monocular depth estimation by proposing MaskingDepth, a semi-supervised framework that enforces consistency between strongly- and weakly-augmented unlabeled data, achieving superior performance compared to state-of-the-art methods on KITTI and NYU-Depth-v2 datasets.
We propose MaskingDepth, a novel semi-supervised learning framework for monocular depth estimation to mitigate the reliance on large ground-truth depth quantities. MaskingDepth is designed to enforce consistency between the strongly-augmented unlabeled data and the pseudo-labels derived from weakly-augmented unlabeled data, which enables learning depth without supervision. In this framework, a novel data augmentation is proposed to take the advantage of a naive masking strategy as an augmentation, while avoiding its scale ambiguity problem between depths from weakly- and strongly-augmented branches and risk of missing small-scale instances. To only retain high-confident depth predictions from the weakly-augmented branch as pseudo-labels, we also present an uncertainty estimation technique, which is used to define robust consistency regularization. Experiments on KITTI and NYU-Depth-v2 datasets demonstrate the effectiveness of each component, its robustness to the use of fewer depth-annotated images, and superior performance compared to other state-of-the-art semi-supervised methods for monocular depth estimation. Furthermore, we show our method can be easily extended to domain adaptation task. Our code is available at https://github.com/KU-CVLAB/MaskingDepth.