CVNov 3, 2024

MambaReg: Mamba-Based Disentangled Convolutional Sparse Coding for Unsupervised Deformable Multi-Modal Image Registration

arXiv:2411.01399v13.72 citationsh-index: 4

Originality Incremental advance

AI Analysis

This addresses deformable multi-modal image registration for medical or remote sensing applications, representing an incremental improvement with a novel hybrid architecture.

The paper tackles the problem of aligning multi-modal images with feature discrepancies in deformable registration by proposing MambaReg, a Mamba-based architecture that disentangles alignment features from non-alignment features. The method outperforms existing approaches on RGB-IR datasets, achieving higher registration accuracy and smoother deformation fields.

Precise alignment of multi-modal images with inherent feature discrepancies poses a pivotal challenge in deformable image registration. Traditional learning-based approaches often consider registration networks as black boxes without interpretability. One core insight is that disentangling alignment features and non-alignment features across modalities bring benefits. Meanwhile, it is challenging for the prominent methods for image registration tasks, such as convolutional neural networks, to capture long-range dependencies by their local receptive fields. The methods often fail when the given image pair has a large misalignment due to the lack of effectively learning long-range dependencies and correspondence. In this paper, we propose MambaReg, a novel Mamba-based architecture that integrates Mamba's strong capability in capturing long sequences to address these challenges. With our proposed several sub-modules, MambaReg can effectively disentangle modality-independent features responsible for registration from modality-dependent, non-aligning features. By selectively attending to the relevant features, our network adeptly captures the correlation between multi-modal images, enabling focused deformation field prediction and precise image alignment. The Mamba-based architecture seamlessly integrates the local feature extraction power of convolutional layers with the long-range dependency modeling capabilities of Mamba. Experiments on public non-rigid RGB-IR image datasets demonstrate the superiority of our method, outperforming existing approaches in terms of registration accuracy and deformation field smoothness.

View on arXiv PDF

Similar