SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object Detection
This work addresses a specific bottleneck in 3D object detection for autonomous driving by enhancing point-based detectors, though it is incremental as it builds on existing set abstraction designs.
The paper tackled the problem of point-based 3D object detection falling behind voxel-based methods by proposing SASA, a semantics-augmented set abstraction method that improves feature learning by retaining more foreground points, achieving comparable performance to state-of-the-art voxel-based methods on KITTI and nuScenes datasets.
Although point-based networks are demonstrated to be accurate for 3D point cloud modeling, they are still falling behind their voxel-based competitors in 3D detection. We observe that the prevailing set abstraction design for down-sampling points may maintain too much unimportant background information that can affect feature learning for detecting objects. To tackle this issue, we propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA). Technically, we first add a binary segmentation module as the side output to help identify foreground points. Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling. In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection. Additionally, it is an easy-to-plug-in module and able to boost various point-based detectors, including single-stage and two-stage ones. Extensive experiments on the popular KITTI and nuScenes datasets validate the superiority of SASA, lifting point-based detection models to reach comparable performance to state-of-the-art voxel-based methods.