CVMar 10, 2025

MambaFlow: A Mamba-Centric Architecture for End-to-End Optical Flow Estimation

arXiv:2503.07046v42 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses optical flow estimation for computer vision applications, presenting an incremental improvement with a new method tailored to this domain.

The paper tackles optical flow estimation by introducing MambaFlow, a novel Mamba-centric architecture, achieving higher accuracy than SEA-RAFT on the Sintel benchmark.

Recently, the Mamba architecture has demonstrated significant successes in various computer vision tasks, such as classification and segmentation. However, its application to optical flow estimation remains unexplored. In this paper, we introduce MambaFlow, a novel framework designed to leverage the high accuracy and efficiency of the Mamba architecture for capturing locally correlated features while preserving global information in end-to-end optical flow estimation. To our knowledge, MambaFlow is the first architecture centered around the Mamba design tailored specifically for optical flow estimation. It comprises two key components: (1) PolyMamba, which enhances feature representation through a dual-Mamba architecture, incorporating a Self-Mamba module for intra-token modeling and a Cross-Mamba module for inter-modality interaction, enabling both deep contextualization and effective feature fusion; and (2) PulseMamba, which leverages an Attention Guidance Aggregator (AGA) to adaptively integrate features with dynamically learned weights in contrast to naive concatenation, and then employs the intrinsic recurrent mechanism of Mamba to perform autoregressive flow decoding, facilitating efficient flow information dissemination. Extensive experiments demonstrate that MambaFlow achieves remarkable results comparable to mainstream methods on benchmark datasets. Compared to SEA-RAFT, MambaFlow attains higher accuracy on the Sintel benchmark, demonstrating stronger potential for real-world deployment on resource-constrained devices. The source code will be made publicly available upon acceptance of the paper.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes