CVJun 2, 2025

MS-RAFT-3D: A Multi-Scale Architecture for Recurrent Image-Based Scene Flow

arXiv:2506.01443v11 citationsh-index: 6Has CodeICIP
Originality Incremental advance
AI Analysis

This work addresses scene flow estimation for computer vision applications, representing an incremental advancement by adapting hierarchical concepts from optical flow.

The paper tackles the problem of image-based scene flow estimation by developing a multi-scale recurrent architecture, achieving state-of-the-art performance with improvements of 8.7% on KITTI and 65.8% on Spring datasets.

Although multi-scale concepts have recently proven useful for recurrent network architectures in the field of optical flow and stereo, they have not been considered for image-based scene flow so far. Hence, based on a single-scale recurrent scene flow backbone, we develop a multi-scale approach that generalizes successful hierarchical ideas from optical flow to image-based scene flow. By considering suitable concepts for the feature and the context encoder, the overall coarse-to-fine framework and the training loss, we succeed to design a scene flow approach that outperforms the current state of the art on KITTI and Spring by 8.7%(3.89 vs. 4.26) and 65.8% (9.13 vs. 26.71), respectively. Our code is available at https://github.com/cv-stuttgart/MS-RAFT-3D.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes