CVApr 2, 2021

Optical Flow Dataset Synthesis from Unpaired Images

arXiv:2104.02615v1
Originality Incremental advance
AI Analysis

This work addresses the domain gap and ground truth contamination issues in optical flow estimation for computer vision researchers, offering a practical solution that is incremental but effective.

The paper tackles the challenge of training supervised optical flow methods by introducing a novel dataset synthesis approach that creates pseudo-real image pairs from unpaired real frames, simulating warps, occlusions, shadows, and illumination changes to provide exact ground truth. This method achieves state-of-the-art or competitive performance on Sintel and KITTI benchmarks, with results on par with more complex training approaches.

The estimation of optical flow is an ambiguous task due to the lack of correspondence at occlusions, shadows, reflections, lack of texture and changes in illumination over time. Thus, unsupervised methods face major challenges as they need to tune complex cost functions with several terms designed to handle each of these sources of ambiguity. In contrast, supervised methods avoid these challenges altogether by relying on explicit ground truth optical flow obtained directly from synthetic or real data. In the case of synthetic data, the ground truth provides an exact and explicit description of what optical flow to assign to a given scene. However, the domain gap between synthetic data and real data often limits the ability of a trained network to generalize. In the case of real data, the ground truth is obtained through multiple sensors and additional data processing, which might introduce persistent errors and contaminate it. As a solution to these issues, we introduce a novel method to build a training set of pseudo-real images that can be used to train optical flow in a supervised manner. Our dataset uses two unpaired frames from real data and creates pairs of frames by simulating random warps, occlusions with super-pixels, shadows and illumination changes, and associates them to their corresponding exact optical flow. We thus obtain the benefit of directly training on real data while having access to an exact ground truth. Training with our datasets on the Sintel and KITTI benchmarks is straightforward and yields models on par or with state of the art performance compared to much more sophisticated training approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes