CVJan 17, 2025

Zero-Shot Monocular Scene Flow Estimation in the Wild

arXiv:2501.10357v220 citationsh-index: 29CVPR
Originality Incremental advance
AI Analysis

This work addresses the challenge of making scene flow prediction practical for real-world applications like robotics and video analysis by improving generalization, though it is incremental as it builds on existing scene flow concepts with novel data and parameterization enhancements.

The paper tackled the problem of scene flow estimation lacking generalization across datasets by proposing a method that jointly estimates geometry and motion, creating a large-scale synthetic dataset, and adopting an effective parameterization, resulting in a model that outperforms existing methods in 3D end-point error and shows zero-shot generalization to DAVIS and RoboTAP videos.

Large models have shown generalization across datasets for many low-level vision tasks, like depth estimation, but no such general models exist for scene flow. Even though scene flow has wide potential use, it is not used in practice because current predictive models do not generalize well. We identify three key challenges and propose solutions for each. First, we create a method that jointly estimates geometry and motion for accurate prediction. Second, we alleviate scene flow data scarcity with a data recipe that affords us 1M annotated training samples across diverse synthetic scenes. Third, we evaluate different parameterizations for scene flow prediction and adopt a natural and effective parameterization. Our resulting model outperforms existing methods as well as baselines built on large-scale models in terms of 3D end-point error, and shows zero-shot generalization to the casually captured videos from DAVIS and the robotic manipulation scenes from RoboTAP. Overall, our approach makes scene flow prediction more practical in-the-wild.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes