CVJun 2, 2025

MS-RAFT-3D: A Multi-Scale Architecture for Recurrent Image-Based Scene Flow

Jakob Schmid, Azin Jahedi, Noah Berenguel Senn, Andrés Bruhn

arXiv:2506.01443v16.21 citationsh-index: 6Has CodeICIP

Originality Incremental advance

AI Analysis

This work addresses scene flow estimation for computer vision applications, representing an incremental advancement by adapting hierarchical concepts from optical flow.

The paper tackles the problem of image-based scene flow estimation by developing a multi-scale recurrent architecture, achieving state-of-the-art performance with improvements of 8.7% on KITTI and 65.8% on Spring datasets.

Although multi-scale concepts have recently proven useful for recurrent network architectures in the field of optical flow and stereo, they have not been considered for image-based scene flow so far. Hence, based on a single-scale recurrent scene flow backbone, we develop a multi-scale approach that generalizes successful hierarchical ideas from optical flow to image-based scene flow. By considering suitable concepts for the feature and the context encoder, the overall coarse-to-fine framework and the training loss, we succeed to design a scene flow approach that outperforms the current state of the art on KITTI and Spring by 8.7%(3.89 vs. 4.26) and 65.8% (9.13 vs. 26.71), respectively. Our code is available at https://github.com/cv-stuttgart/MS-RAFT-3D.

View on arXiv PDF Code

Similar