Structure Aware and Class Balanced 3D Object Detection on nuScenes Dataset
This work addresses class imbalance and localization issues in autonomous driving perception, representing an incremental improvement to existing methods.
The paper tackles the problem of class imbalance and localization precision in 3D object detection on the nuScenes dataset by proposing an auxiliary network that leverages point cloud structure information to improve the CBGS model, achieving enhanced localization accuracy without extra inference computation.
3-D object detection is pivotal for autonomous driving. Point cloud based methods have become increasingly popular for 3-D object detection, owing to their accurate depth information. NuTonomy's nuScenes dataset greatly extends commonly used datasets such as KITTI in size, sensor modalities, categories, and annotation numbers. However, it suffers from severe class imbalance. The Class-balanced Grouping and Sampling paper addresses this issue and suggests augmentation and sampling strategy. However, the localization precision of this model is affected by the loss of spatial information in the downscaled feature maps. We propose to enhance the performance of the CBGS model by designing an auxiliary network, that makes full use of the structure information of the 3D point cloud, in order to improve the localization accuracy. The detachable auxiliary network is jointly optimized by two point-level supervisions, namely foreground segmentation and center estimation. The auxiliary network does not introduce any extra computation during inference, since it can be detached at test time.