A Vision-Centric Approach for Static Map Element Annotation
This addresses the need for consistent and accurate training data for autonomous driving systems, though it is incremental as it builds on existing datasets like nuScenes.
The paper tackles the problem of generating high-quality 3D annotations for static map elements (HD Maps) without LiDAR inputs, resulting in models trained with their annotations achieving lower reprojection errors (e.g., 4.73 vs. 8.03 pixels).
The recent development of online static map element (a.k.a. HD Map) construction algorithms has raised a vast demand for data with ground truth annotations. However, available public datasets currently cannot provide high-quality training data regarding consistency and accuracy. To this end, we present CAMA: a vision-centric approach for Consistent and Accurate Map Annotation. Without LiDAR inputs, our proposed framework can still generate high-quality 3D annotations of static map elements. Specifically, the annotation can achieve high reprojection accuracy across all surrounding cameras and is spatial-temporal consistent across the whole sequence. We apply our proposed framework to the popular nuScenes dataset to provide efficient and highly accurate annotations. Compared with the original nuScenes static map element, models trained with annotations from CAMA achieve lower reprojection errors (e.g., 4.73 vs. 8.03 pixels).