Frame Fusion with Vehicle Motion Prediction for 3D Object Detection
This addresses the challenge of leveraging temporal information for more accurate 3D object detection in autonomous driving, though it is incremental as it builds on existing detection methods.
The paper tackles the problem of improving 3D object detection in LiDAR data by fusing history frames with vehicle motion prediction, resulting in a consistent performance boost of about 2 vehicle level 2 APH on the Waymo Open Dataset with negligible latency.
In LiDAR-based 3D detection, history point clouds contain rich temporal information helpful for future prediction. In the same way, history detections should contribute to future detections. In this paper, we propose a detection enhancement method, namely FrameFusion, which improves 3D object detection results by fusing history frames. In FrameFusion, we ''forward'' history frames to the current frame and apply weighted Non-Maximum-Suppression on dense bounding boxes to obtain a fused frame with merged boxes. To ''forward'' frames, we use vehicle motion models to estimate the future pose of the bounding boxes. However, the commonly used constant velocity model fails naturally on turning vehicles, so we explore two vehicle motion models to address this issue. On Waymo Open Dataset, our FrameFusion method consistently improves the performance of various 3D detectors by about $2$ vehicle level 2 APH with negligible latency and slightly enhances the performance of the temporal fusion method MPPNet. We also conduct extensive experiments on motion model selection.