TSceneJAL: Joint Active Learning of Traffic Scenes for 3D Object Detection
This work addresses the problem of dataset inefficiency for autonomous driving systems, offering a method to reduce labeling costs while improving detection performance, though it is incremental as it builds on existing active learning techniques.
The paper tackles the high cost and redundancy in autonomous driving datasets by proposing TSceneJAL, a joint active learning framework that efficiently samples balanced, diverse, and complex traffic scenes from labeled and unlabeled data, achieving up to 12% improvements in 3D object detection tasks.
Most autonomous driving (AD) datasets incur substantial costs for collection and labeling, inevitably yielding a plethora of low-quality and redundant data instances, thereby compromising performance and efficiency. Many applications in AD systems necessitate high-quality training datasets using both existing datasets and newly collected data. In this paper, we propose a traffic scene joint active learning (TSceneJAL) framework that can efficiently sample the balanced, diverse, and complex traffic scenes from both labeled and unlabeled data. The novelty of this framework is threefold: 1) a scene sampling scheme based on a category entropy, to identify scenes containing multiple object classes, thus mitigating class imbalance for the active learner; 2) a similarity sampling scheme, estimated through the directed graph representation and a marginalize kernel algorithm, to pick sparse and diverse scenes; 3) an uncertainty sampling scheme, predicted by a mixture density network, to select instances with the most unclear or complex regression outcomes for the learner. Finally, the integration of these three schemes in a joint selection strategy yields an optimal and valuable subdataset. Experiments on the KITTI, Lyft, nuScenes and SUScape datasets demonstrate that our approach outperforms existing state-of-the-art methods on 3D object detection tasks with up to 12% improvements.