CVAISep 23, 2024

Image-Guided Semantic Pseudo-LiDAR Point Generation for 3D Object Detection

arXiv:2409.14985v41 citationsh-index: 12Has Code
Originality Highly original
AI Analysis

This addresses a critical safety issue in autonomous driving by enhancing 3D object detection for small and distant objects like pedestrians and cyclists, representing a novel method rather than an incremental improvement.

The paper tackles the problem of detecting small or distant objects in autonomous driving by generating dense and semantically meaningful 3D points using image features, resulting in significant improvements such as reducing false positives by nearly 50% and achieving state-of-the-art cyclist performance on the KITTI benchmark.

In autonomous driving scenarios, accurate perception is becoming an even more critical task for safe navigation. While LiDAR provides precise spatial data, its inherent sparsity makes it difficult to detect small or distant objects. Existing methods try to address this by generating additional points within a Region of Interest (RoI), but relying on LiDAR alone often leads to false positives and a failure to recover meaningful structures. To address these limitations, we propose Image-Guided Semantic Pseudo-LiDAR Point Generation model, called ImagePG, a novel framework that leverages rich RGB image features to generate dense and semantically meaningful 3D points. Our framework includes an Image-Guided RoI Points Generation (IG-RPG) module, which creates pseudo-points guided by image features, and an Image-Aware Occupancy Prediction Network (I-OPN), which provides spatial priors to guide point placement. A multi-stage refinement (MR) module further enhances point quality and detection robustness. To the best of our knowledge, ImagePG is the first method to directly leverage image features for point generation. Extensive experiments on the KITTI and Waymo datasets demonstrate that ImagePG significantly improves the detection of small and distant objects like pedestrians and cyclists, reducing false positives by nearly 50%. On the KITTI benchmark, our framework improves mAP by +1.38%p (car), +7.91%p (pedestrian), and +5.21%p (cyclist) on the test set over the baseline, achieving state-of-the-art cyclist performance on the KITTI leaderboard. The code is available at: https://github.com/MS-LIMA/ImagePG

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes