CVMar 30, 2022

FlowFormer: A Transformer Architecture for Optical Flow

arXiv:2203.16194v4439 citations
Originality Highly original
AI Analysis

This work addresses optical flow estimation for computer vision applications, presenting a novel method with significant performance improvements.

The paper tackles optical flow estimation by introducing FlowFormer, a transformer-based architecture that tokenizes the 4D cost volume and uses novel encoding and decoding methods, achieving a 16.5% error reduction on the Sintel clean pass and 21.7% better generalization without training on Sintel.

We introduce optical Flow transFormer, dubbed as FlowFormer, a transformer-based neural network architecture for learning optical flow. FlowFormer tokenizes the 4D cost volume built from an image pair, encodes the cost tokens into a cost memory with alternate-group transformer (AGT) layers in a novel latent space, and decodes the cost memory via a recurrent transformer decoder with dynamic positional cost queries. On the Sintel benchmark, FlowFormer achieves 1.159 and 2.088 average end-point-error (AEPE) on the clean and final pass, a 16.5% and 15.5% error reduction from the best published result (1.388 and 2.47). Besides, FlowFormer also achieves strong generalization performance. Without being trained on Sintel, FlowFormer achieves 1.01 AEPE on the clean pass of Sintel training set, outperforming the best published result (1.29) by 21.7%.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes