SurfaceAug: Closing the Gap in Multimodal Ground Truth Sampling
This work addresses a key bottleneck in multimodal object detection for autonomous driving by providing a more effective data augmentation method, though it appears incremental as it builds on existing ground truth sampling approaches.
The paper tackled the performance gap between multimodal and LiDAR-only object detectors by introducing SurfaceAug, a novel ground truth sampling algorithm that pastes objects in both images and point clouds, resulting in outperforming existing methods and establishing a new state of the art on car detection tasks in KITTI.
Despite recent advances in both model architectures and data augmentation, multimodal object detectors still barely outperform their LiDAR-only counterparts. This shortcoming has been attributed to a lack of sufficiently powerful multimodal data augmentation. To address this, we present SurfaceAug, a novel ground truth sampling algorithm. SurfaceAug pastes objects by resampling both images and point clouds, enabling object-level transformations in both modalities. We evaluate our algorithm by training a multimodal detector on KITTI and compare its performance to previous works. We show experimentally that SurfaceAug outperforms existing methods on car detection tasks and establishes a new state of the art for multimodal ground truth sampling.