CVMar 31, 2021

Full Surround Monodepth from Multiple Cameras

arXiv:2104.00152v163 citations
Originality Incremental advance
AI Analysis

This work addresses the need for cost-effective, full-surround depth sensing in robotics and autonomous driving, representing an incremental improvement over existing methods.

The paper tackles the problem of limited scene coverage in self-supervised monocular depth estimation by extending it to multi-camera rigs, achieving dense, consistent, and scale-aware 360-degree point clouds comparable to LiDAR.

Self-supervised monocular depth and ego-motion estimation is a promising approach to replace or supplement expensive depth sensors such as LiDAR for robotics applications like autonomous driving. However, most research in this area focuses on a single monocular camera or stereo pairs that cover only a fraction of the scene around the vehicle. In this work, we extend monocular self-supervised depth and ego-motion estimation to large-baseline multi-camera rigs. Using generalized spatio-temporal contexts, pose consistency constraints, and carefully designed photometric loss masking, we learn a single network generating dense, consistent, and scale-aware point clouds that cover the same full surround 360 degree field of view as a typical LiDAR scanner. We also propose a new scale-consistent evaluation metric more suitable to multi-camera settings. Experiments on two challenging benchmarks illustrate the benefits of our approach over strong baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes