CVLGDec 22, 2023

Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers

arXiv:2312.14919v318 citationsh-index: 42024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Originality Highly original
AI Analysis

This addresses a key perception challenge for autonomous driving systems by simplifying fusion architecture while improving performance.

The paper tackles the problem of camera-lidar fusion for autonomous driving by showing that monocular depth estimation is an unnecessary bottleneck, and introduces a transformer-based method that bypasses it to achieve better 3D object detection on the nuScenes dataset.

Combining complementary sensor modalities is crucial to providing robust perception for safety-critical robotics applications such as autonomous driving (AD). Recent state-of-the-art camera-lidar fusion methods for AD rely on monocular depth estimation which is a notoriously difficult task compared to using depth information from the lidar directly. Here, we find that this approach does not leverage depth as expected and show that naively improving depth estimation does not lead to improvements in object detection performance. Strikingly, we also find that removing depth estimation altogether does not degrade object detection performance substantially, suggesting that relying on monocular depth could be an unnecessary architectural bottleneck during camera-lidar fusion. In this work, we introduce a novel fusion method that bypasses monocular depth estimation altogether and instead selects and fuses camera and lidar features in a bird's-eye-view grid using a simple attention mechanism. We show that our model can modulate its use of camera features based on the availability of lidar features and that it yields better 3D object detection on the nuScenes dataset than baselines relying on monocular depth estimation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes