SPIRAL: Semantic-Aware Progressive LiDAR Scene Generation and Understanding
This work addresses the need for efficient and consistent labeled LiDAR scene generation for autonomous driving and robotics, offering a novel method that improves over incremental hybrid approaches.
The paper tackled the problem of generating labeled LiDAR scenes in range-view, where existing methods produce unlabeled data and rely on segmentation models leading to poor consistency; the result is Spiral, a diffusion model that simultaneously generates depth, reflectance, and semantic maps, achieving state-of-the-art performance with the smallest parameter size on SemanticKITTI and nuScenes datasets and enabling effective synthetic data augmentation to reduce labeling effort.
Leveraging recent diffusion models, LiDAR-based large-scale 3D scene generation has achieved great success. While recent voxel-based approaches can generate both geometric structures and semantic labels, existing range-view methods are limited to producing unlabeled LiDAR scenes. Relying on pretrained segmentation models to predict the semantic maps often results in suboptimal cross-modal consistency. To address this limitation while preserving the advantages of range-view representations, such as computational efficiency and simplified network design, we propose Spiral, a novel range-view LiDAR diffusion model that simultaneously generates depth, reflectance images, and semantic maps. Furthermore, we introduce novel semantic-aware metrics to evaluate the quality of the generated labeled range-view data. Experiments on the SemanticKITTI and nuScenes datasets demonstrate that Spiral achieves state-of-the-art performance with the smallest parameter size, outperforming two-step methods that combine the generative and segmentation models. Additionally, we validate that range images generated by Spiral can be effectively used for synthetic data augmentation in the downstream segmentation training, significantly reducing the labeling effort on LiDAR data.