CVMay 5, 2024

PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection

arXiv:2405.02811v19 citationsh-index: 30ICRA
Originality Highly original
AI Analysis

This addresses accuracy and scalability issues in 3D object detection for autonomous driving, with incremental improvements over existing methods.

The paper tackled the information bottleneck in 3D object detection caused by PointNet pooling, proposing PVTransformer with an attention module for point-to-voxel aggregation, resulting in state-of-the-art performance of 76.5 mAPH L2 on the Waymo Open Dataset, a +1.7 mAPH L2 improvement over prior art.

3D object detectors for point clouds often rely on a pooling-based PointNet to encode sparse points into grid-like voxels or pillars. In this paper, we identify that the common PointNet design introduces an information bottleneck that limits 3D object detection accuracy and scalability. To address this limitation, we propose PVTransformer: a transformer-based point-to-voxel architecture for 3D detection. Our key idea is to replace the PointNet pooling operation with an attention module, leading to a better point-to-voxel aggregation function. Our design respects the permutation invariance of sparse 3D points while being more expressive than the pooling-based PointNet. Experimental results show our PVTransformer achieves much better performance compared to the latest 3D object detectors. On the widely used Waymo Open Dataset, our PVTransformer achieves state-of-the-art 76.5 mAPH L2, outperforming the prior art of SWFormer by +1.7 mAPH L2.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes