VMatcher: State-Space Semi-Dense Local Feature Matching
This addresses efficiency bottlenecks in image feature matching for computer vision tasks, though it is incremental as it combines existing methods.
The paper tackles the high computational cost of Transformer-based feature matching by introducing VMatcher, a hybrid Mamba-Transformer network that achieves new benchmarks with linear complexity, offering significant efficiency gains for real-time applications.
This paper introduces VMatcher, a hybrid Mamba-Transformer network for semi-dense feature matching between image pairs. Learning-based feature matching methods, whether detector-based or detector-free, achieve state-of-the-art performance but depend heavily on the Transformer's attention mechanism, which, while effective, incurs high computational costs due to its quadratic complexity. In contrast, Mamba introduces a Selective State-Space Model (SSM) that achieves comparable or superior performance with linear complexity, offering significant efficiency gains. VMatcher leverages a hybrid approach, integrating Mamba's highly efficient long-sequence processing with the Transformer's attention mechanism. Multiple VMatcher configurations are proposed, including hierarchical architectures, demonstrating their effectiveness in setting new benchmarks efficiently while ensuring robustness and practicality for real-time applications where rapid inference is crucial. Source Code is available at: https://github.com/ayoussf/VMatcher