CVLGNov 17, 2017

Fusing Bird View LIDAR Point Cloud and Front View Camera Image for Deep Object Detection

arXiv:1711.06703v329 citations
Originality Incremental advance
AI Analysis

This is an incremental improvement for autonomous driving systems, enhancing pedestrian detection through sensor fusion.

The paper tackles 3D object detection for autonomous driving by fusing LIDAR point clouds and camera images using a non-homogeneous pooling layer in a CNN, showing improved pedestrian detection on the KITTI dataset.

We propose a new method for fusing a LIDAR point cloud and camera-captured images in the deep convolutional neural network (CNN). The proposed method constructs a new layer called non-homogeneous pooling layer to transform features between bird view map and front view map. The sparse LIDAR point cloud is used to construct the mapping between the two maps. The pooling layer allows efficient fusion of the bird view and front view features at any stage of the network. This is favorable for the 3D-object detection using camera-LIDAR fusion in autonomous driving scenarios. A corresponding deep CNN is designed and tested on the KITTI bird view object detection dataset, which produces 3D bounding boxes from the bird view map. The fusion method shows particular benefit for detection of pedestrians in the bird view compared to other fusion-based object detection networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes