LiDAR dataset distillation within bayesian active learning framework: Understanding the effect of data augmentation
This addresses annotation cost reduction for autonomous driving datasets, but it is incremental as it applies existing methods to a specific domain.
The paper tackled the problem of reducing annotation costs and dataset size for autonomous driving LiDAR datasets by evaluating active learning-based dataset distillation on a subset of Semantic-KITTI, showing that data augmentation achieves full dataset accuracy with only 60% of samples.
Autonomous driving (AD) datasets have progressively grown in size in the past few years to enable better deep representation learning. Active learning (AL) has re-gained attention recently to address reduction of annotation costs and dataset size. AL has remained relatively unexplored for AD datasets, especially on point cloud data from LiDARs. This paper performs a principled evaluation of AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset. Further on, the gains in model performance due to data augmentation (DA) are demonstrated across different subsets of the AL loop. We also demonstrate how DA improves the selection of informative samples to annotate. We observe that data augmentation achieves full dataset accuracy using only 60\% of samples from the selected dataset configuration. This provides faster training time and subsequent gains in annotation costs.