CVLGMar 18, 2025

MamBEV: Enabling State Space Models to Learn Birds-Eye-View Representations

arXiv:2503.13858v26 citationsh-index: 13ICLR
Originality Incremental advance
AI Analysis

This work addresses computational bottlenecks in autonomous driving systems, though it appears incremental as it adapts state space models to an existing BEV paradigm.

The paper tackles the challenge of computational efficiency in 3D visual perception for autonomous driving by proposing MamBEV, a Mamba-based framework that learns unified Bird's Eye View representations, resulting in significantly improved computational and memory efficiency across multiple tasks.

3D visual perception tasks, such as 3D detection from multi-camera images, are essential components of autonomous driving and assistance systems. However, designing computationally efficient methods remains a significant challenge. In this paper, we propose a Mamba-based framework called MamBEV, which learns unified Bird's Eye View (BEV) representations using linear spatio-temporal SSM-based attention. This approach supports multiple 3D perception tasks with significantly improved computational and memory efficiency. Furthermore, we introduce SSM based cross-attention, analogous to standard cross attention, where BEV query representations can interact with relevant image features. Extensive experiments demonstrate MamBEV's promising performance across diverse visual perception metrics, highlighting its advantages in input scaling efficiency compared to existing benchmark models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes