CVApr 8, 2025

POD: Predictive Object Detection with Single-Frame FMCW LiDAR Point Cloud

Yining Shi, Kun Jiang, Xin Zhao, Kangan Qian, Chuchu Xie, Tuopu Wen, Mengmeng Yang, Diange Yang

Tsinghua

arXiv:2504.05649v16.21 citationsh-index: 13

Originality Incremental advance

AI Analysis

This addresses the need for faster response to dangers in autonomous driving by avoiding multi-frame historical data, though it is incremental as it builds on existing detection methods with a novel task.

The paper tackles 3D object detection for autonomous driving by introducing predictive object detection (POD), which predicts short-term future object locations and dimensions using only a single frame of FMCW LiDAR data with radial velocity, achieving state-of-the-art performance on an in-house dataset.

LiDAR-based 3D object detection is a fundamental task in the field of autonomous driving. This paper explores the unique advantage of Frequency Modulated Continuous Wave (FMCW) LiDAR in autonomous perception. Given a single frame FMCW point cloud with radial velocity measurements, we expect that our object detector can detect the short-term future locations of objects using only the current frame sensor data and demonstrate a fast ability to respond to intermediate danger. To achieve this, we extend the standard object detection task to a novel task named predictive object detection (POD), which aims to predict the short-term future location and dimensions of objects based solely on current observations. Typically, a motion prediction task requires historical sensor information to process the temporal contexts of each object, while our detector's avoidance of multi-frame historical information enables a much faster response time to potential dangers. The core advantage of FMCW LiDAR lies in the radial velocity associated with every reflected point. We propose a novel POD framework, the core idea of which is to generate a virtual future point using a ray casting mechanism, create virtual two-frame point clouds with the current and virtual future frames, and encode these two-frame voxel features with a sparse 4D encoder. Subsequently, the 4D voxel features are separated by temporal indices and remapped into two Bird's Eye View (BEV) features: one decoded for standard current frame object detection and the other for future predictive object detection. Extensive experiments on our in-house dataset demonstrate the state-of-the-art standard and predictive detection performance of the proposed POD framework.

View on arXiv PDF

Similar