CVSep 21, 2024

LFP: Efficient and Accurate End-to-End Lane-Level Planning via Camera-LiDAR Fusion

arXiv:2409.14170v1h-index: 11
Originality Incremental advance
AI Analysis

This addresses efficiency and performance bottlenecks in autonomous driving planning for real-time applications, though it builds incrementally on existing camera-only frameworks.

The paper tackles inefficiencies in multi-modal autonomous driving systems by proposing LFP, a camera-LiDAR fusion method that uses lanes as the fusion unit to reduce redundant processing and enhance feature interaction, achieving state-of-the-art performance with up to 15% improvement in driving score and 14% in infraction score while maintaining 19.27 FPS.

Multi-modal systems enhance performance in autonomous driving but face inefficiencies due to indiscriminate processing within each modality. Additionally, the independent feature learning of each modality lacks interaction, which results in extracted features that do not possess the complementary characteristics. These issue increases the cost of fusing redundant information across modalities. To address these challenges, we propose targeting driving-relevant elements, which reduces the volume of LiDAR features while preserving critical information. This approach enhances lane level interaction between the image and LiDAR branches, allowing for the extraction and fusion of their respective advantageous features. Building upon the camera-only framework PHP, we introduce the Lane-level camera-LiDAR Fusion Planning (LFP) method, which balances efficiency with performance by using lanes as the unit for sensor fusion. Specifically, we design three modules to enhance efficiency and performance. For efficiency, we propose an image-guided coarse lane prior generation module that forecasts the region of interest (ROI) for lanes and assigns a confidence score, guiding LiDAR processing. The LiDAR feature extraction modules leverages lane-aware priors from the image branch to guide sampling for pillar, retaining essential pillars. For performance, the lane-level cross-modal query integration and feature enhancement module uses confidence score from ROI to combine low-confidence image queries with LiDAR queries, extracting complementary depth features. These features enhance the low-confidence image features, compensating for the lack of depth. Experiments on the Carla benchmarks show that our method achieves state-of-the-art performance in both driving score and infraction score, with maximum improvement of 15% and 14% over existing algorithms, respectively, maintaining high frame rate of 19.27 FPS.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes