CVJul 17, 2020

EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection

arXiv:2007.08856v1475 citationsHas Code
AI Analysis

This work improves 3D object detection for autonomous driving and robotics by enhancing sensor fusion and confidence consistency, though it is incremental as it builds on existing multi-sensor approaches.

The paper tackled 3D object detection by fusing LiDAR point cloud and camera image data without image annotations and addressing inconsistency between localization and classification confidence, achieving state-of-the-art results on KITTI and SUN-RGBD datasets.

In this paper, we aim at addressing two critical issues in the 3D detection task, including the exploitation of multiple sensors~(namely LiDAR point cloud and camera image), as well as the inconsistency between the localization and classification confidence. To this end, we propose a novel fusion module to enhance the point features with semantic image features in a point-wise manner without any image annotations. Besides, a consistency enforcing loss is employed to explicitly encourage the consistency of both the localization and classification confidence. We design an end-to-end learnable framework named EPNet to integrate these two components. Extensive experiments on the KITTI and SUN-RGBD datasets demonstrate the superiority of EPNet over the state-of-the-art methods. Codes and models are available at: \url{https://github.com/happinesslz/EPNet}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes