CVLGROJul 25, 2020

Crowdsourced 3D Mapping: A Combined Multi-View Geometry and Self-Supervised Learning Approach

arXiv:2007.12918v16 citations
AI Analysis

This addresses the need for efficient large-scale dynamic mapping and autonomous driving by enabling 3D landmark positioning from monocular cameras and GPS, though it is incremental as it builds on existing multi-view geometry and self-supervised learning methods.

The paper tackles the problem of 3D mapping from crowdsourced visual data without assuming known camera intrinsics, achieving an average single-journey relative positioning accuracy of 39cm and absolute accuracy of 1.26m on a traffic sign dataset.

The ability to efficiently utilize crowdsourced visual data carries immense potential for the domains of large scale dynamic mapping and autonomous driving. However, state-of-the-art methods for crowdsourced 3D mapping assume prior knowledge of camera intrinsics. In this work, we propose a framework that estimates the 3D positions of semantically meaningful landmarks such as traffic signs without assuming known camera intrinsics, using only monocular color camera and GPS. We utilize multi-view geometry as well as deep learning based self-calibration, depth, and ego-motion estimation for traffic sign positioning, and show that combining their strengths is important for increasing the map coverage. To facilitate research on this task, we construct and make available a KITTI based 3D traffic sign ground truth positioning dataset. Using our proposed framework, we achieve an average single-journey relative and absolute positioning accuracy of 39cm and 1.26m respectively, on this dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes