Multi-dimensional Fusion and Consistency for Semi-supervised Medical Image Segmentation
This work addresses the problem of limited labeled data in medical imaging for researchers and practitioners, representing an incremental advancement through hybrid methods.
The paper tackles medical image segmentation by introducing a semi-supervised learning framework that combines ViTs and CNNs with vision-language modalities, achieving state-of-the-art results on multiple datasets.
In this paper, we introduce a novel semi-supervised learning framework tailored for medical image segmentation. Central to our approach is the innovative Multi-scale Text-aware ViT-CNN Fusion scheme. This scheme adeptly combines the strengths of both ViTs and CNNs, capitalizing on the unique advantages of both architectures as well as the complementary information in vision-language modalities. Further enriching our framework, we propose the Multi-Axis Consistency framework for generating robust pseudo labels, thereby enhancing the semisupervised learning process. Our extensive experiments on several widelyused datasets unequivocally demonstrate the efficacy of our approach.