CVOct 16, 2024

MambaBEV: An efficient 3D detection model with Mamba2

arXiv:2410.12673v26 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses accuracy issues in autonomous driving perception, particularly for large objects, but appears incremental as it adapts existing Mamba2 to BEV fusion.

The paper tackled the problem of global context modeling in 3D object detection for autonomous driving by introducing MambaBEV, which uses Mamba2 for temporal fusion, achieving an NDS of 51.7% and mAP of 42.7% on the nuScenes dataset.

Accurate 3D object detection in autonomous driving relies on Bird's Eye View (BEV) perception and effective temporal fusion.However, existing fusion strategies based on convolutional layers or deformable self attention struggle with global context modeling in BEV space,leading to lower accuracy for large objects. To address this, we introduce MambaBEV, a novel BEV based 3D object detection model that leverages Mamba2, an advanced state space model (SSM) optimized for long sequence processing.Our key contribution is TemporalMamba, a temporal fusion module that enhances global awareness by introducing a BEV feature discrete rearrangement mechanism tailored for Mamba's sequential processing. Additionally, we propose Mamba based DETR as the detection head to improve multi object representation.Evaluations on the nuScenes dataset demonstrate that MambaBEV base achieves an NDS of 51.7\% and an mAP of 42.7\%.Furthermore, an end to end autonomous driving paradigm validates its effectiveness in motion forecasting and planning.Our results highlight the potential of SSMs in autonomous driving perception, particularly in enhancing global context understanding and large object detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes