CVJul 27, 2025

VESPA: Towards un(Human)supervised Open-World Pointcloud Labeling for Autonomous Driving

arXiv:2507.20397v12 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses the bottleneck of data labeling for autonomous driving systems, offering a scalable solution for generating 3D pseudolabels without ground-truth annotations, though it appears incremental by combining existing modalities.

The paper tackles the problem of costly manual annotation for 3D point clouds in autonomous driving by introducing VESPA, a multimodal autolabeling pipeline that fuses LiDAR and camera data with vision-language models, achieving an AP of 52.95% for object discovery and up to 46.54% for multiclass object detection on the Nuscenes dataset.

Data collection for autonomous driving is rapidly accelerating, but manual annotation, especially for 3D labels, remains a major bottleneck due to its high cost and labor intensity. Autolabeling has emerged as a scalable alternative, allowing the generation of labels for point clouds with minimal human intervention. While LiDAR-based autolabeling methods leverage geometric information, they struggle with inherent limitations of lidar data, such as sparsity, occlusions, and incomplete object observations. Furthermore, these methods typically operate in a class-agnostic manner, offering limited semantic granularity. To address these challenges, we introduce VESPA, a multimodal autolabeling pipeline that fuses the geometric precision of LiDAR with the semantic richness of camera images. Our approach leverages vision-language models (VLMs) to enable open-vocabulary object labeling and to refine detection quality directly in the point cloud domain. VESPA supports the discovery of novel categories and produces high-quality 3D pseudolabels without requiring ground-truth annotations or HD maps. On Nuscenes dataset, VESPA achieves an AP of 52.95% for object discovery and up to 46.54% for multiclass object detection, demonstrating strong performance in scalable 3D scene understanding. Code will be available upon acceptance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes