CVApr 25, 2021

Temp-Frustum Net: 3D Object Detection with Temporal Fusion

arXiv:2104.12106v27 citationsHas Code
Originality Incremental advance
AI Analysis

This work improves 3D object detection for automated driving systems by mitigating single-frame failures and occlusions, though it is incremental as it builds on existing frustum networks with a novel fusion module.

The paper tackles 3D object detection in automated driving by addressing noise, obstruction, and sparsity issues in frame-by-frame methods, achieving improvements of ~6%, ~4%, and ~6% on Car, Pedestrian, and Cyclist classes on the KITTI dataset through temporal fusion.

3D object detection is a core component of automated driving systems. State-of-the-art methods fuse RGB imagery and LiDAR point cloud data frame-by-frame for 3D bounding box regression. However, frame-by-frame 3D object detection suffers from noise, field-of-view obstruction, and sparsity. We propose a novel Temporal Fusion Module (TFM) to use information from previous time-steps to mitigate these problems. First, a state-of-the-art frustum network extracts point cloud features from raw RGB and LiDAR point cloud data frame-by-frame. Then, our TFM module fuses these features with a recurrent neural network. As a result, 3D object detection becomes robust against single frame failures and transient occlusions. Experiments on the KITTI object tracking dataset show the efficiency of the proposed TFM, where we obtain ~6%, ~4%, and ~6% improvements on Car, Pedestrian, and Cyclist classes, respectively, compared to frame-by-frame baselines. Furthermore, ablation studies reinforce that the subject of improvement is temporal fusion and show the effects of different placements of TFM in the object detection pipeline. Our code is open-source and available at https://github.com/emecercelik/Temp-Frustum-Net.git.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes