Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection
This work addresses efficient 3D object detection for autonomous driving by introducing an incremental online method that improves speed and accuracy over existing approaches.
The paper tackles efficient multi-view 3D object detection by proposing StreamPETR, an object-centric temporal modeling framework that achieves significant performance improvements with negligible computation cost, achieving 67.6% NDS and 65.3% AMOTA on nuScenes, comparable to lidar-based methods, and a lightweight version outperforms state-of-the-art by 2.3% mAP and 1.8x faster FPS.
In this paper, we propose a long-sequence modeling framework, named StreamPETR, for multi-view 3D object detection. Built upon the sparse query design in the PETR series, we systematically develop an object-centric temporal mechanism. The model is performed in an online manner and the long-term historical information is propagated through object queries frame by frame. Besides, we introduce a motion-aware layer normalization to model the movement of the objects. StreamPETR achieves significant performance improvements only with negligible computation cost, compared to the single-frame baseline. On the standard nuScenes benchmark, it is the first online multi-view method that achieves comparable performance (67.6% NDS & 65.3% AMOTA) with lidar-based methods. The lightweight version realizes 45.0% mAP and 31.7 FPS, outperforming the state-of-the-art method (SOLOFusion) by 2.3% mAP and 1.8x faster FPS. Code has been available at https://github.com/exiawsh/StreamPETR.git.