CVJul 24, 2023

SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation

Yiqing Wang, Zihan Li, Jieru Mei, Zihao Wei, Li Liu, Chen Wang, Shengtian Sang, Alan Yuille, Cihang Xie, Yuyin Zhou

arXiv:2307.12591v119.873 citationsh-index: 134Has Code

Originality Incremental advance

AI Analysis

This work addresses data scarcity in medical imaging for researchers and practitioners, offering an incremental improvement over existing self-supervised methods.

The paper tackles the challenge of limited pre-training data in medical image segmentation by introducing SwinMM, a multi-view self-supervised pipeline that improves accuracy and data-efficiency, outperforming the previous state-of-the-art method Swin UNETR on several tasks.

Recent advancements in large-scale Vision Transformers have made significant strides in improving pre-trained models for medical image segmentation. However, these methods face a notable challenge in acquiring a substantial amount of pre-training data, particularly within the medical field. To address this limitation, we present Masked Multi-view with Swin Transformers (SwinMM), a novel multi-view pipeline for enabling accurate and data-efficient self-supervised medical image analysis. Our strategy harnesses the potential of multi-view information by incorporating two principal components. In the pre-training phase, we deploy a masked multi-view encoder devised to concurrently train masked multi-view observations through a range of diverse proxy tasks. These tasks span image reconstruction, rotation, contrastive learning, and a novel task that employs a mutual learning paradigm. This new task capitalizes on the consistency between predictions from various perspectives, enabling the extraction of hidden multi-view information from 3D medical data. In the fine-tuning stage, a cross-view decoder is developed to aggregate the multi-view information through a cross-attention block. Compared with the previous state-of-the-art self-supervised learning method Swin UNETR, SwinMM demonstrates a notable advantage on several medical image segmentation tasks. It allows for a smooth integration of multi-view information, significantly boosting both the accuracy and data-efficiency of the model. Code and models are available at https://github.com/UCSC-VLAA/SwinMM/.

View on arXiv PDF Code

Similar