SAMora: Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images
This work addresses the challenge of data scarcity in medical imaging for researchers and practitioners, offering an incremental improvement over existing SAM variants.
The paper tackles the problem of limited labeled data for medical image segmentation with SAM by proposing SAMora, a framework that uses hierarchical self-supervised pre-training to capture multi-scale information, achieving state-of-the-art performance on datasets like Synapse and reducing fine-tuning epochs by 90%.
The Segment Anything Model (SAM) has demonstrated significant potential in medical image segmentation. Yet, its performance is limited when only a small amount of labeled data is available, while there is abundant valuable yet often overlooked hierarchical information in medical data. To address this limitation, we draw inspiration from self-supervised learning and propose SAMora, an innovative framework that captures hierarchical medical knowledge by applying complementary self-supervised learning objectives at the image, patch, and pixel levels. To fully exploit the complementarity of hierarchical knowledge within LoRAs, we introduce HL-Attn, a hierarchical fusion module that integrates multi-scale features while maintaining their distinct characteristics. SAMora is compatible with various SAM variants, including SAM2, SAMed, and H-SAM. Experimental results on the Synapse, LA, and PROMISE12 datasets demonstrate that SAMora outperforms existing SAM variants. It achieves state-of-the-art performance in both few-shot and fully supervised settings while reducing fine-tuning epochs by 90%. The code is available at https://github.com/ShChen233/SAMora.