FIN: Fast Inference Network for Map Segmentation
This work addresses the critical need for fast and accurate map segmentation in autonomous vehicles, representing an incremental improvement with specific gains in speed and performance.
The paper tackles the problem of achieving high accuracy and real-time performance in map segmentation for autonomous vehicles by proposing a novel and efficient architecture using camera-radar fusion in the BEV space, resulting in 53.5 mIoU and a 260% improvement in inference time over baselines.
Multi-sensor fusion in autonomous vehicles is becoming more common to offer a more robust alternative for several perception tasks. This need arises from the unique contribution of each sensor in collecting data: camera-radar fusion offers a cost-effective solution by combining rich semantic information from cameras with accurate distance measurements from radar, without incurring excessive financial costs or overwhelming data processing requirements. Map segmentation is a critical task for enabling effective vehicle behaviour in its environment, yet it continues to face significant challenges in achieving high accuracy and meeting real-time performance requirements. Therefore, this work presents a novel and efficient map segmentation architecture, using cameras and radars, in the \acrfull{bev} space. Our model introduces a real-time map segmentation architecture considering aspects such as high accuracy, per-class balancing, and inference time. To accomplish this, we use an advanced loss set together with a new lightweight head to improve the perception results. Our results show that, with these modifications, our approach achieves results comparable to large models, reaching 53.5 mIoU, while also setting a new benchmark for inference time, improving it by 260\% over the strongest baseline models.