X-view: Non-egocentric Multi-View 3D Object Detector
This work addresses a domain-specific bottleneck in autonomous driving by enhancing 3D object detection through a novel multi-view approach, offering incremental improvements over existing methods.
The paper tackles the problem of coarse grid partitions in distant regions for 3D object detection in autonomous driving by proposing X-view, a non-egocentric multi-view method that generalizes perspective views beyond traditional coordinate constraints, resulting in consistent improvements when combined with four state-of-the-art 3D detection methods on KITTI and NuScenes datasets.
3D object detection algorithms for autonomous driving reason about 3D obstacles either from 3D birds-eye view or perspective view or both. Recent works attempt to improve the detection performance via mining and fusing from multiple egocentric views. Although the egocentric perspective view alleviates some weaknesses of the birds-eye view, the sectored grid partition becomes so coarse in the distance that the targets and surrounding context mix together, which makes the features less discriminative. In this paper, we generalize the research on 3D multi-view learning and propose a novel multi-view-based 3D detection method, named X-view, to overcome the drawbacks of the multi-view methods. Specifically, X-view breaks through the traditional limitation about the perspective view whose original point must be consistent with the 3D Cartesian coordinate. X-view is designed as a general paradigm that can be applied on almost any 3D detectors based on LiDAR with only little increment of running time, no matter it is voxel/grid-based or raw-point-based. We conduct experiments on KITTI and NuScenes datasets to demonstrate the robustness and effectiveness of our proposed X-view. The results show that X-view obtains consistent improvements when combined with four mainstream state-of-the-art 3D methods: SECOND, PointRCNN, Part-A^2, and PV-RCNN.