Integrating Multi-Modal Sensors: A Review of Fusion Techniques for Intelligent Vehicles
It addresses the problem of improving environmental perception for autonomous driving systems, but it is incremental as it primarily reviews existing methods.
This paper reviews multi-sensor fusion techniques for autonomous vehicles, categorizing strategies into data-level, feature-level, and decision-level and discussing deep learning methods, datasets, and emerging trends like Vision-Language Models to enhance perception and robustness.
Multi-sensor fusion plays a critical role in enhancing perception for autonomous driving, overcoming individual sensor limitations, and enabling comprehensive environmental understanding. This paper first formalizes multi-sensor fusion strategies into data-level, feature-level, and decision-level categories and then provides a systematic review of deep learning-based methods corresponding to each strategy. We present key multi-modal datasets and discuss their applicability in addressing real-world challenges, particularly in adverse weather conditions and complex urban environments. Additionally, we explore emerging trends, including the integration of Vision-Language Models (VLMs), Large Language Models (LLMs), and the role of sensor fusion in end-to-end autonomous driving, highlighting its potential to enhance system adaptability and robustness. Our work offers valuable insights into current methods and future directions for multi-sensor fusion in autonomous driving.