CVLGROFeb 25, 2022

Towards Safe, Real-Time Systems: Stereo vs Images and LiDAR for 3D Object Detection

arXiv:2202.12773v1
Originality Incremental advance
AI Analysis

This work addresses the need for safe, real-time object detection systems by exploring stereo as a practical alternative to expensive LiDAR, though it appears incremental in its approach.

The paper tackled the problem of 3D object detection by evaluating stereo as a cost-effective alternative to monocular images or LiDAR, showing that multimodal learning with disparity algorithms improves image-based results without extra parameters and matches LiDAR's 3D localization in some contexts, with benchmarks on KITTI revealing and correcting common metric errors.

As object detectors rapidly improve, attention has expanded past image-only networks to include a range of 3D and multimodal frameworks, especially ones that incorporate LiDAR. However, due to cost, logistics, and even some safety considerations, stereo can be an appealing alternative. Towards understanding the efficacy of stereo as a replacement for monocular input or LiDAR in object detectors, we show that multimodal learning with traditional disparity algorithms can improve image-based results without increasing the number of parameters, and that learning over stereo error can impart similar 3D localization power to LiDAR in certain contexts. Furthermore, doing so also has calibration benefits with respect to image-only methods. We benchmark on the public dataset KITTI, and in doing so, reveal a few small but common algorithmic mistakes currently used in computing metrics on that set, and offer efficient, provably correct alternatives.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes