PointPainting: Sequential Fusion for 3D Object Detection
This addresses a gap in sensor fusion for autonomous vehicles, offering a practical solution to enhance 3D object detection performance.
The paper tackled the problem of sensor fusion for 3D object detection in self-driving cars, where lidar-only methods were outperforming fusion methods, and proposed PointPainting, a sequential fusion method that appends semantic segmentation scores from images to lidar points, resulting in large improvements on state-of-the-art methods and achieving new state-of-the-art on the KITTI leaderboard.
Camera and lidar are important sensor modalities for robotics in general and self-driving cars in particular. The sensors provide complementary information offering an opportunity for tight sensor-fusion. Surprisingly, lidar-only methods outperform fusion methods on the main benchmark datasets, suggesting a gap in the literature. In this work, we propose PointPainting: a sequential fusion method to fill this gap. PointPainting works by projecting lidar points into the output of an image-only semantic segmentation network and appending the class scores to each point. The appended (painted) point cloud can then be fed to any lidar-only method. Experiments show large improvements on three different state-of-the art methods, Point-RCNN, VoxelNet and PointPillars on the KITTI and nuScenes datasets. The painted version of PointRCNN represents a new state of the art on the KITTI leaderboard for the bird's-eye view detection task. In ablation, we study how the effects of Painting depends on the quality and format of the semantic segmentation output, and demonstrate how latency can be minimized through pipelining.