CVFeb 25

SF3D-RGB: Scene Flow Estimation from Monocular Camera and Sparse LiDAR

arXiv:2602.21699v1h-index: 15
Originality Incremental advance
AI Analysis

This work addresses scene flow estimation for computer vision applications like autonomous driving, but it is incremental as it builds on existing fusion methods with efficiency improvements.

The paper tackles scene flow estimation by proposing SF3D-RGB, a deep learning architecture that fuses monocular images and sparse LiDAR point clouds, achieving better accuracy than single-modality methods and using fewer parameters compared to other fusion-based state-of-the-art approaches.

Scene flow estimation is an extremely important task in computer vision to support the perception of dynamic changes in the scene. For robust scene flow, learning-based approaches have recently achieved impressive results using either image-based or LiDAR-based modalities. However, these methods have tended to focus on the use of a single modality. To tackle these problems, we present a deep learning architecture, SF3D-RGB, that enables sparse scene flow estimation using 2D monocular images and 3D point clouds (e.g., acquired by LiDAR) as inputs. Our architecture is an end-to-end model that first encodes information from each modality into features and fuses them together. Then, the fused features enhance a graph matching module for better and more robust mapping matrix computation to generate an initial scene flow. Finally, a residual scene flow module further refines the initial scene flow. Our model is designed to strike a balance between accuracy and efficiency. Furthermore, experiments show that our proposed method outperforms single-modality methods and achieves better scene flow accuracy on real-world datasets while using fewer parameters compared to other state-of-the-art methods with fusion.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes