A Deep Temporal Fusion Framework for Scene Flow Using a Learnable Motion Model and Occlusions
This work addresses motion estimation challenges for applications like environmental perception in vehicles, offering a fast multi-frame extension that enhances scene flow accuracy, though it is incremental as it builds on existing estimators.
The paper tackles the problem of motion estimation in computer vision, particularly addressing occlusions and out-of-view motions in multi-frame setups, by proposing a data-driven temporal fusion framework that learns motion relations from data and refines scene flow estimates, resulting in improved performance over dual-frame methods.
Motion estimation is one of the core challenges in computer vision. With traditional dual-frame approaches, occlusions and out-of-view motions are a limiting factor, especially in the context of environmental perception for vehicles due to the large (ego-) motion of objects. Our work proposes a novel data-driven approach for temporal fusion of scene flow estimates in a multi-frame setup to overcome the issue of occlusion. Contrary to most previous methods, we do not rely on a constant motion model, but instead learn a generic temporal relation of motion from data. In a second step, a neural network combines bi-directional scene flow estimates from a common reference frame, yielding a refined estimate and a natural byproduct of occlusion masks. This way, our approach provides a fast multi-frame extension for a variety of scene flow estimators, which outperforms the underlying dual-frame approaches.