CVApr 23, 2023

TransFlow: Transformer as Flow Learner

arXiv:2304.11523v174 citationsh-index: 24
Originality Highly original
AI Analysis

This addresses optical flow estimation for computer vision tasks, offering a novel transformer-based method that improves performance and reduces training complexity.

The authors tackled optical flow estimation by proposing TransFlow, a pure transformer architecture that achieved state-of-the-art results on benchmarks like Sintel and KITTI-15, with advantages in accuracy, information recovery, and simplified training.

Optical flow is an indispensable building block for various important computer vision tasks, including motion estimation, object tracking, and disparity measurement. In this work, we propose TransFlow, a pure transformer architecture for optical flow estimation. Compared to dominant CNN-based methods, TransFlow demonstrates three advantages. First, it provides more accurate correlation and trustworthy matching in flow estimation by utilizing spatial self-attention and cross-attention mechanisms between adjacent frames to effectively capture global dependencies; Second, it recovers more compromised information (e.g., occlusion and motion blur) in flow estimation through long-range temporal association in dynamic scenes; Third, it enables a concise self-learning paradigm and effectively eliminate the complex and laborious multi-stage pre-training procedures. We achieve the state-of-the-art results on the Sintel, KITTI-15, as well as several downstream tasks, including video object detection, interpolation and stabilization. For its efficacy, we hope TransFlow could serve as a flexible baseline for optical flow estimation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes