CVLGFeb 6, 2022

LiDAR dataset distillation within bayesian active learning framework: Understanding the effect of data augmentation

arXiv:2202.02661v13 citations
Originality Synthesis-oriented
AI Analysis

This addresses annotation cost reduction for autonomous driving datasets, but it is incremental as it applies existing methods to a specific domain.

The paper tackled the problem of reducing annotation costs and dataset size for autonomous driving LiDAR datasets by evaluating active learning-based dataset distillation on a subset of Semantic-KITTI, showing that data augmentation achieves full dataset accuracy with only 60% of samples.

Autonomous driving (AD) datasets have progressively grown in size in the past few years to enable better deep representation learning. Active learning (AL) has re-gained attention recently to address reduction of annotation costs and dataset size. AL has remained relatively unexplored for AD datasets, especially on point cloud data from LiDARs. This paper performs a principled evaluation of AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset. Further on, the gains in model performance due to data augmentation (DA) are demonstrated across different subsets of the AL loop. We also demonstrate how DA improves the selection of informative samples to annotate. We observe that data augmentation achieves full dataset accuracy using only 60\% of samples from the selected dataset configuration. This provides faster training time and subsequent gains in annotation costs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes