CVJun 4, 2021

Self-Supervised Learning of Domain Invariant Features for Depth Estimation

arXiv:2106.02594v416 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of domain shift in depth estimation for computer vision applications, offering an incremental improvement over existing methods.

The paper tackles unsupervised synthetic-to-real domain adaptation for single image depth estimation by proposing a self-supervised training strategy to learn domain invariant features, resulting in improved generalization and outperforming state-of-the-art methods by 14.7% on Sq Rel on the KITTI dataset.

We tackle the problem of unsupervised synthetic-to-real domain adaptation for single image depth estimation. An essential building block of single image depth estimation is an encoder-decoder task network that takes RGB images as input and produces depth maps as output. In this paper, we propose a novel training strategy to force the task network to learn domain invariant representations in a selfsupervised manner. Specifically, we extend self-supervised learning from traditional representation learning, which works on images from a single domain, to domain invariant representation learning, which works on images from two different domains by utilizing an image-to-image translation network. Firstly, we use an image-to-image translation network to transfer domain-specific styles between synthetic and real domains. This style transfer operation allows us to obtain similar images from the different domains. Secondly, we jointly train our task network and Siamese network with the same images from the different domains to obtain domain invariance for the task network. Finally, we fine-tune the task network using labeled synthetic and unlabeled realworld data. Our training strategy yields improved generalization capability in the real-world domain. We carry out an extensive evaluation on two popular datasets for depth estimation, KITTI and Make3D. The results demonstrate that our proposed method outperforms the state-of-the-art on all metrics, e.g. by 14.7% on Sq Rel on KITTI. The source code and model weights will be made available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes