CVApr 9, 2023

BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo

Yinhao Li, Jinrong Yang, Jianjian Sun, Han Bao, Zheng Ge, Li Xiao

arXiv:2304.04185v12.83 citationsh-index: 23

Originality Incremental advance

AI Analysis

This work addresses performance bottlenecks in 3D object detection for autonomous driving systems, representing an incremental improvement over existing methods.

The paper tackles the depth ambiguity problem in multi-view 3D object detection by proposing BEVStereo++, which uses a dynamic temporal stereo strategy to reduce computational costs and handle mobile objects in outdoor scenes, achieving state-of-the-art results on Waymo and nuScenes datasets.

Bounded by the inherent ambiguity of depth perception, contemporary multi-view 3D object detection methods fall into the performance bottleneck. Intuitively, leveraging temporal multi-view stereo (MVS) technology is the natural knowledge for tackling this ambiguity. However, traditional attempts of MVS has two limitations when applying to 3D object detection scenes: 1) The affinity measurement among all views suffers expensive computational cost; 2) It is difficult to deal with outdoor scenarios where objects are often mobile. To this end, we propose BEVStereo++: by introducing a dynamic temporal stereo strategy, BEVStereo++ is able to cut down the harm that is brought by introducing temporal stereo when dealing with those two scenarios. Going one step further, we apply Motion Compensation Module and long sequence Frame Fusion to BEVStereo++, which shows further performance boosting and error reduction. Without bells and whistles, BEVStereo++ achieves state-of-the-art(SOTA) on both Waymo and nuScenes dataset.

View on arXiv PDF

Similar