CVJul 14, 2020

Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance

Marvin Klingner, Jan-Aike Termöhlen, Jonas Mikolajczyk, Tim Fingscheidt

arXiv:2007.06936v230.8393 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses a key limitation for applications like autonomous driving by improving depth estimation accuracy in dynamic scenes, though it is an incremental advancement over existing self-supervised methods.

The paper tackles the problem of moving dynamic objects violating static-world assumptions in self-supervised monocular depth estimation, resulting in a method that exceeds all baselines on the Eigen split without test-time refinement.

Self-supervised monocular depth estimation presents a powerful method to obtain 3D scene information from single camera images, which is trainable on arbitrary image sequences without requiring depth labels, e.g., from a LiDAR sensor. In this work we present a new self-supervised semantically-guided depth estimation (SGDepth) method to deal with moving dynamic-class (DC) objects, such as moving cars and pedestrians, which violate the static-world assumptions typically made during training of such models. Specifically, we propose (i) mutually beneficial cross-domain training of (supervised) semantic segmentation and self-supervised depth estimation with task-specific network heads, (ii) a semantic masking scheme providing guidance to prevent moving DC objects from contaminating the photometric loss, and (iii) a detection method for frames with non-moving DC objects, from which the depth of DC objects can be learned. We demonstrate the performance of our method on several benchmarks, in particular on the Eigen split, where we exceed all baselines without test-time refinement.

View on arXiv PDF Code

Similar