CVJul 12, 2022

M-FUSE: Multi-frame Fusion for Scene Flow Estimation

arXiv:2207.05704v220 citationsh-index: 35Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of scene flow estimation for autonomous driving by improving accuracy through temporal fusion, though it is incremental as it builds upon RAFT-3D.

The paper tackles the limitation of existing scene flow estimation networks that use only two frames by proposing a multi-frame approach that incorporates an additional preceding stereo pair, resulting in a 16% improvement over the original RAFT-3D method and ranking second overall on the KITTI benchmark.

Recently, neural network for scene flow estimation show impressive results on automotive data such as the KITTI benchmark. However, despite of using sophisticated rigidity assumptions and parametrizations, such networks are typically limited to only two frame pairs which does not allow them to exploit temporal information. In our paper we address this shortcoming by proposing a novel multi-frame approach that considers an additional preceding stereo pair. To this end, we proceed in two steps: Firstly, building upon the recent RAFT-3D approach, we develop an improved two-frame baseline by incorporating an advanced stereo method. Secondly, and even more importantly, exploiting the specific modeling concepts of RAFT-3D, we propose a U-Net architecture that performs a fusion of forward and backward flow estimates and hence allows to integrate temporal information on demand. Experiments on the KITTI benchmark do not only show that the advantages of the improved baseline and the temporal fusion approach complement each other, they also demonstrate that the computed scene flow is highly accurate. More precisely, our approach ranks second overall and first for the even more challenging foreground objects, in total outperforming the original RAFT-3D method by more than 16%. Code is available at https://github.com/cv-stuttgart/M-FUSE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes