CVAug 15, 2024

MambaMIM: Pre-training Mamba with State Space Token Interpolation and its Application to Medical Image Segmentation

arXiv:2408.08070v213 citationsh-index: 8Has Code
AI Analysis

This addresses the challenge of leveraging Mamba's long-sequence capabilities for medical imaging, though it appears incremental as it adapts existing masked image modeling to state space models.

The authors tackled the problem of pre-training Mamba state space models for medical image segmentation by proposing MambaMIM with a token interpolation strategy, achieving state-of-the-art performance on eight public benchmarks using a dataset of 6.8K CT scans.

Recently, the state space model Mamba has demonstrated efficient long-sequence modeling capabilities, particularly for addressing long-sequence visual tasks in 3D medical imaging. However, existing generative self-supervised learning methods have not yet fully unleashed Mamba's potential for handling long-range dependencies because they overlook the inherent causal properties of state space sequences in masked modeling. To address this challenge, we propose a general-purpose pre-training framework called MambaMIM, a masked image modeling method based on a novel TOKen-Interpolation strategy (TOKI) for the selective structure state space sequence, which learns causal relationships of state space within the masked sequence. Further, MambaMIM introduces a bottom-up 3D hybrid masking strategy to maintain a masking consistency across different architectures and can be used on any single or hybrid Mamba architecture to enhance its multi-scale and long-range representation capability. We pre-train MambaMIM on a large-scale dataset of 6.8K CT scans and evaluate its performance across eight public medical segmentation benchmarks. Extensive downstream experiments reveal the feasibility and advancement of using Mamba for medical image pre-training. In particular, when we apply the MambaMIM to a customized architecture that hybridizes MedNeXt and Vision Mamba, we consistently obtain the state-of-the-art segmentation performance. The code is available at: https://github.com/FengheTan9/MambaMIM.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes