FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of Autonomous Driving
This addresses the challenge of integrating multi-sensor data for prediction and planning in autonomous driving, which is largely unexplored, representing a novel approach rather than an incremental improvement.
The paper tackles the problem of jointly optimizing prediction and planning tasks in autonomous driving by fusing camera and LiDAR data, achieving state-of-the-art performance with improvements such as a 15% average gain in perception tasks, a reduction in prediction error from 0.708 to 0.389 ADE, and a collision rate drop from 0.31% to 0.12%.
Building a multi-modality multi-task neural network toward accurate and robust performance is a de-facto standard in perception task of autonomous driving. However, leveraging such data from multiple sensors to jointly optimize the prediction and planning tasks remains largely unexplored. In this paper, we present FusionAD, to the best of our knowledge, the first unified framework that fuse the information from two most critical sensors, camera and LiDAR, goes beyond perception task. Concretely, we first build a transformer based multi-modality fusion network to effectively produce fusion based features. In constrast to camera-based end-to-end method UniAD, we then establish a fusion aided modality-aware prediction and status-aware planning modules, dubbed FMSPnP that take advantages of multi-modality features. We conduct extensive experiments on commonly used benchmark nuScenes dataset, our FusionAD achieves state-of-the-art performance and surpassing baselines on average 15% on perception tasks like detection and tracking, 10% on occupancy prediction accuracy, reducing prediction error from 0.708 to 0.389 in ADE score and reduces the collision rate from 0.31% to only 0.12%.