CVMar 30

FlowIt: Global Matching for Optical Flow with Confidence-Guided Refinement

arXiv:2603.2875952.9h-index: 33
AI Analysis

This addresses robust motion estimation in computer vision, with significant but incremental improvements over existing methods.

The paper tackles optical flow estimation for large pixel displacements by introducing FlowIt, which uses a hierarchical transformer and optimal transport for global matching, achieving state-of-the-art results on Sintel and KITTI benchmarks and setting new records for cross-dataset generalization.

We present FlowIt, a novel architecture for optical flow estimation designed to robustly handle large pixel displacements. At its core, FlowIt leverages a hierarchical transformer architecture that captures extensive global context, enabling the model to effectively model long-range correspondences. To overcome the limitations of localized matching, we formulate the flow initialization as an optimal transport problem. This formulation yields a highly robust initial flow field, alongside explicitly derived occlusion and confidence maps. These cues are then seamlessly integrated into a guided refinement stage, where the network actively propagates reliable motion estimates from high-confidence regions into ambiguous, low-confidence areas. Extensive experiments across the Sintel, KITTI, Spring, and LayeredFlow datasets validate the efficacy of our approach. FlowIt achieves state-of-the-art results on the competitive Sintel and KITTI benchmarks, while simultaneously establishing new state-of-the-art cross-dataset zero-shot generalization performance on Sintel, Spring, and LayeredFlow.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes