CVAIJun 22, 2025

Multimodal Fusion SLAM with Fourier Attention

arXiv:2506.18204v22 citationsh-index: 5Has CodeIEEE Robot Autom Lett
Originality Incremental advance
AI Analysis

This work addresses robustness in SLAM for security robots and similar applications, though it appears incremental as it builds on existing multimodal fusion and attention techniques.

The paper tackles the challenge of visual SLAM in noisy, varying lighting, and dark environments by proposing FMF-SLAM, an efficient multimodal fusion method using Fourier-based attention, achieving state-of-the-art performance with real-time feasibility on datasets like TUM and TartanAir.

Visual SLAM is particularly challenging in environments affected by noise, varying lighting conditions, and darkness. Learning-based optical flow algorithms can leverage multiple modalities to address these challenges, but traditional optical flow-based visual SLAM approaches often require significant computational resources.To overcome this limitation, we propose FMF-SLAM, an efficient multimodal fusion SLAM method that utilizes fast Fourier transform (FFT) to enhance the algorithm efficiency. Specifically, we introduce a novel Fourier-based self-attention and cross-attention mechanism to extract features from RGB and depth signals. We further enhance the interaction of multimodal features by incorporating multi-scale knowledge distillation across modalities. We also demonstrate the practical feasibility of FMF-SLAM in real-world scenarios with real time performance by integrating it with a security robot by fusing with a global positioning module GNSS-RTK and global Bundle Adjustment. Our approach is validated using video sequences from TUM, TartanAir, and our real-world datasets, showcasing state-of-the-art performance under noisy, varying lighting, and dark conditions.Our code and datasets are available at https://github.com/youjie-zhou/FMF-SLAM.git.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes