CVAIROApr 25, 2025

A Multimodal Hybrid Late-Cascade Fusion Network for Enhanced 3D Object Detection

arXiv:2504.18419v14 citationsh-index: 52Has CodeECCV Workshops
Originality Incremental advance
AI Analysis

This work addresses 3D object detection for autonomous driving systems, presenting an incremental improvement by enhancing existing single-modal detectors with a flexible fusion scheme.

The paper tackled the problem of 3D object detection from multimodal inputs by proposing a hybrid late-cascade fusion network that combines LiDAR and RGB data to reduce false positives and recover false negatives. It showed significant performance improvements on the KITTI benchmark, particularly for detecting pedestrians and cyclists.

We present a new way to detect 3D objects from multimodal inputs, leveraging both LiDAR and RGB cameras in a hybrid late-cascade scheme, that combines an RGB detection network and a 3D LiDAR detector. We exploit late fusion principles to reduce LiDAR False Positives, matching LiDAR detections with RGB ones by projecting the LiDAR bounding boxes on the image. We rely on cascade fusion principles to recover LiDAR False Negatives leveraging epipolar constraints and frustums generated by RGB detections of separate views. Our solution can be plugged on top of any underlying single-modal detectors, enabling a flexible training process that can take advantage of pre-trained LiDAR and RGB detectors, or train the two branches separately. We evaluate our results on the KITTI object detection benchmark, showing significant performance improvements, especially for the detection of Pedestrians and Cyclists.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes