ROCVMar 5, 2025

BEVDriver: Leveraging BEV Maps in LLMs for Robust Closed-Loop Driving

arXiv:2503.03074v127 citationsh-index: 3IROS
Originality Incremental advance
AI Analysis

This work addresses the problem of robust and transparent autonomous driving for future mobility by combining BEV maps with LLMs, representing an incremental advancement in integrating perception and reasoning.

The paper tackles the challenge of integrating 3D spatial grounding with reasoning and language capabilities in autonomous driving by introducing BEVDriver, an LLM-based model for closed-loop driving that uses latent BEV features as input, achieving up to 18.9% higher Driving Score on the LangAuto benchmark compared to state-of-the-art methods.

Autonomous driving has the potential to set the stage for more efficient future mobility, requiring the research domain to establish trust through safe, reliable and transparent driving. Large Language Models (LLMs) possess reasoning capabilities and natural language understanding, presenting the potential to serve as generalized decision-makers for ego-motion planning that can interact with humans and navigate environments designed for human drivers. While this research avenue is promising, current autonomous driving approaches are challenged by combining 3D spatial grounding and the reasoning and language capabilities of LLMs. We introduce BEVDriver, an LLM-based model for end-to-end closed-loop driving in CARLA that utilizes latent BEV features as perception input. BEVDriver includes a BEV encoder to efficiently process multi-view images and 3D LiDAR point clouds. Within a common latent space, the BEV features are propagated through a Q-Former to align with natural language instructions and passed to the LLM that predicts and plans precise future trajectories while considering navigation instructions and critical scenarios. On the LangAuto benchmark, our model reaches up to 18.9% higher performance on the Driving Score compared to SoTA methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes