VANETs Meet Autonomous Vehicles: A Multimodal 3D Environment Learning Approach
This work addresses safety and perception challenges for autonomous driving systems by integrating multimodal data, though it is incremental as it builds on existing fusion techniques.
The paper tackles the problem of object detection and mapping in autonomous vehicles by fusing stereo camera, Lidar, and V2V communication data, achieving improved accuracy through a semi-supervised manifold alignment approach on the Kitti dataset.
In this paper, we design a multimodal framework for object detection, recognition and mapping based on the fusion of stereo camera frames, point cloud Velodyne Lidar scans, and Vehicle-to-Vehicle (V2V) Basic Safety Messages (BSMs) exchanged using Dedicated Short Range Communication (DSRC). We merge the key features of rich texture descriptions of objects from 2D images, depth and distance between objects provided by 3D point cloud and awareness of hidden vehicles from BSMs' 3D information. We present a joint pixel to point cloud and pixel to V2V correspondences of objects in frames from the Kitti Vision Benchmark Suite by using a semi-supervised manifold alignment approach to achieve camera-Lidar and camera-V2V mapping of their recognized objects that have the same underlying manifold.