DESC: Domain Adaptation for Depth Estimation via Semantic Consistency
This work addresses the challenge of acquiring accurate depth data for training models, which is important for applications like autonomous driving and robotics, but it is incremental as it builds on existing domain adaptation techniques.
The paper tackles the problem of monocular depth estimation when real depth annotations are scarce by proposing a domain adaptation approach that uses a fully-annotated source dataset and a non-annotated target dataset, achieving consistent improvements over state-of-the-art methods on standard benchmarks.
Accurate real depth annotations are difficult to acquire, needing the use of special devices such as a LiDAR sensor. Self-supervised methods try to overcome this problem by processing video or stereo sequences, which may not always be available. Instead, in this paper, we propose a domain adaptation approach to train a monocular depth estimation model using a fully-annotated source dataset and a non-annotated target dataset. We bridge the domain gap by leveraging semantic predictions and low-level edge features to provide guidance for the target domain. We enforce consistency between the main model and a second model trained with semantic segmentation and edge maps, and introduce priors in the form of instance heights. Our approach is evaluated on standard domain adaptation benchmarks for monocular depth estimation and show consistent improvement upon the state-of-the-art.