CVDec 4, 2023

BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection

arXiv:2312.01696v258 citationsh-index: 12Has CodeCVPR
AI Analysis

This work improves 3D object detection for autonomous driving by modernizing dense BEV frameworks, though it is incremental as it builds on existing dense BEV methods.

The paper tackled the limitations of dense BEV-based 3D object detectors by introducing enhanced components like a CRF-modulated depth estimation module and a two-stage object decoder, resulting in BEVNeXt achieving a state-of-the-art 64.2 NDS on the nuScenes test set.

Recently, the rise of query-based Transformer decoders is reshaping camera-based 3D object detection. These query-based decoders are surpassing the traditional dense BEV (Bird's Eye View)-based methods. However, we argue that dense BEV frameworks remain important due to their outstanding abilities in depth estimation and object localization, depicting 3D scenes accurately and comprehensively. This paper aims to address the drawbacks of the existing dense BEV-based 3D object detectors by introducing our proposed enhanced components, including a CRF-modulated depth estimation module enforcing object-level consistencies, a long-term temporal aggregation module with extended receptive fields, and a two-stage object decoder combining perspective techniques with CRF-modulated depth embedding. These enhancements lead to a "modernized" dense BEV framework dubbed BEVNeXt. On the nuScenes benchmark, BEVNeXt outperforms both BEV-based and query-based frameworks under various settings, achieving a state-of-the-art result of 64.2 NDS on the nuScenes test set. Code will be available at \url{https://github.com/woxihuanjiangguo/BEVNeXt}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes