Towards Autonomous Driving: a Multi-Modal 360$^{\circ}$ Perception Proposal
This work addresses perception challenges for autonomous driving systems, but it appears incremental as it combines existing methods like CNNs, PointNet, and Kalman filters in a novel sensor fusion configuration.
The paper tackles 3D object detection and tracking for autonomous vehicles by proposing a multi-modal 360-degree framework that integrates CNN-based instance segmentation, LiDAR-to-image association, PointNet for 3D bounding boxes, and Unscented Kalman Filter for tracking, resulting in accurate and reliable road environment detection as validated in real-world tests.
In this paper, a multi-modal 360$^{\circ}$ framework for 3D object detection and tracking for autonomous vehicles is presented. The process is divided into four main stages. First, images are fed into a CNN network to obtain instance segmentation of the surrounding road participants. Second, LiDAR-to-image association is performed for the estimated mask proposals. Then, the isolated points of every object are processed by a PointNet ensemble to compute their corresponding 3D bounding boxes and poses. Lastly, a tracking stage based on Unscented Kalman Filter is used to track the agents along time. The solution, based on a novel sensor fusion configuration, provides accurate and reliable road environment detection. A wide variety of tests of the system, deployed in an autonomous vehicle, have successfully assessed the suitability of the proposed perception stack in a real autonomous driving application.