CVLGFeb 27, 2020

Semantically-Guided Representation Learning for Self-Supervised Monocular Depth

arXiv:2002.12319v1265 citations
AI Analysis

This work addresses monocular depth estimation for computer vision applications, offering incremental improvements by integrating semantic guidance into self-supervised learning.

The paper tackles the problem of self-supervised monocular depth estimation by leveraging semantic structure to guide geometric representation learning without using semantic labels, resulting in state-of-the-art improvements in depth prediction across all pixels, fine-grained details, and per semantic categories.

Self-supervised learning is showing great promise for monocular depth estimation, using geometry as the only source of supervision. Depth networks are indeed capable of learning representations that relate visual appearance to 3D properties by implicitly leveraging category-level patterns. In this work we investigate how to leverage more directly this semantic structure to guide geometric representation learning, while remaining in the self-supervised regime. Instead of using semantic labels and proxy losses in a multi-task approach, we propose a new architecture leveraging fixed pretrained semantic segmentation networks to guide self-supervised representation learning via pixel-adaptive convolutions. Furthermore, we propose a two-stage training process to overcome a common semantic bias on dynamic objects via resampling. Our method improves upon the state of the art for self-supervised monocular depth prediction over all pixels, fine-grained details, and per semantic categories.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes