Improving Generalization Ability for 3D Object Detection by Learning Sparsity-invariant Features
This work addresses a critical issue for autonomous driving systems by improving generalization from a single source domain, though it is incremental as it builds on existing teacher-student frameworks and alignment techniques.
The paper tackles the problem of performance degradation in 3D object detection when applied to unseen domains with different sensor configurations and scene distributions, achieving superior generalization capabilities compared to baselines and even outperforming some domain adaptation methods.
In autonomous driving, 3D object detection is essential for accurately identifying and tracking objects. Despite the continuous development of various technologies for this task, a significant drawback is observed in most of them-they experience substantial performance degradation when detecting objects in unseen domains. In this paper, we propose a method to improve the generalization ability for 3D object detection on a single domain. We primarily focus on generalizing from a single source domain to target domains with distinct sensor configurations and scene distributions. To learn sparsity-invariant features from a single source domain, we selectively subsample the source data to a specific beam, using confidence scores determined by the current detector to identify the density that holds utmost importance for the detector. Subsequently, we employ the teacher-student framework to align the Bird's Eye View (BEV) features for different point clouds densities. We also utilize feature content alignment (FCA) and graph-based embedding relationship alignment (GERA) to instruct the detector to be domain-agnostic. Extensive experiments demonstrate that our method exhibits superior generalization capabilities compared to other baselines. Furthermore, our approach even outperforms certain domain adaptation methods that can access to the target domain data.