Generalized Few-Shot 3D Object Detection of LiDAR Point Cloud for Autonomous Driving
This addresses the challenge of detecting rare objects in autonomous driving scenarios, which is important for special cases like emergency rescue, but the approach is incremental as it builds on existing 3D detection models.
The paper tackles the problem of detecting rare fine-grained objects like police cars and ambulances in 3D LiDAR point clouds for autonomous driving, where existing methods rely on large labeled datasets. It proposes a generalized few-shot 3D object detection framework, achieving successful detection of novel classes with few training data while maintaining accuracy on common objects, as demonstrated on the nuScenes dataset.
Recent years have witnessed huge successes in 3D object detection to recognize common objects for autonomous driving (e.g., vehicles and pedestrians). However, most methods rely heavily on a large amount of well-labeled training data. This limits their capability of detecting rare fine-grained objects (e.g., police cars and ambulances), which is important for special cases, such as emergency rescue, and so on. To achieve simultaneous detection for both common and rare objects, we propose a novel task, called generalized few-shot 3D object detection, where we have a large amount of training data for common (base) objects, but only a few data for rare (novel) classes. Specifically, we analyze in-depth differences between images and point clouds, and then present a practical principle for the few-shot setting in the 3D LiDAR dataset. To solve this task, we propose a simple and effective detection framework, including (1) an incremental fine-tuning method to extend existing 3D detection models to recognize both common and rare objects, and (2) a sample adaptive balance loss to alleviate the issue of long-tailed data distribution in autonomous driving scenarios. On the nuScenes dataset, we conduct sufficient experiments to demonstrate that our approach can successfully detect the rare (novel) classes that contain only a few training data, while also maintaining the detection accuracy of common objects.