MambaGlue: Fast and Robust Local Feature Matching With Mamba
This work addresses the need for efficient and reliable matching techniques in computer vision applications, representing an incremental advancement by integrating the emerging Mamba architecture into feature matching.
The paper tackles the problem of achieving both robust and fast local feature matching in computer vision by proposing MambaGlue, a novel Mamba-based approach that includes a MambaAttention mixer and a deep confidence score regressor, resulting in substantial performance improvements over baselines while maintaining fast inference speed as verified on public datasets.
In recent years, robust matching methods using deep learning-based approaches have been actively studied and improved in computer vision tasks. However, there remains a persistent demand for both robust and fast matching techniques. To address this, we propose a novel Mamba-based local feature matching approach, called MambaGlue, where Mamba is an emerging state-of-the-art architecture rapidly gaining recognition for its superior speed in both training and inference, and promising performance compared with Transformer architectures. In particular, we propose two modules: a) MambaAttention mixer to simultaneously and selectively understand the local and global context through the Mamba-based self-attention structure and b) deep confidence score regressor, which is a multi-layer perceptron (MLP)-based architecture that evaluates a score indicating how confidently matching predictions correspond to the ground-truth correspondences. Consequently, our MambaGlue achieves a balance between robustness and efficiency in real-world applications. As verified on various public datasets, we demonstrate that our MambaGlue yields a substantial performance improvement over baseline approaches while maintaining fast inference speed. Our code will be available on https://github.com/url-kaist/MambaGlue