IV CVJan 22, 2024

MOSformer: Momentum encoder-based inter-slice fusion transformer for medical image segmentation

De-Xing Huang, Xiao-Hu Zhou, Mei-Jiang Gui, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Zhen-Qiu Feng, Zhi-Chao Lai, Zeng-Guang Hou

arXiv:2401.11856v48.52 citationsh-index: 18

Originality Incremental advance

AI Analysis

This work addresses a bottleneck in medical image segmentation for clinical applications by improving inter-slice fusion, though it is incremental over existing 2.5D approaches.

The paper tackled the problem of suboptimal segmentation performance in 2.5D-based medical image models by proposing MOSformer, which uses dual encoders and a fusion transformer to better integrate inter-slice information, achieving state-of-the-art results of 85.63%, 92.19%, and 85.43% DSC on three benchmark datasets.

Medical image segmentation takes an important position in various clinical applications. 2.5D-based segmentation models bridge the computational efficiency of 2D-based models with the spatial perception capabilities of 3D-based models. However, existing 2.5D-based models primarily adopt a single encoder to extract features of target and neighborhood slices, failing to effectively fuse inter-slice information, resulting in suboptimal segmentation performance. In this study, a novel momentum encoder-based inter-slice fusion transformer (MOSformer) is proposed to overcome this issue by leveraging inter-slice information from multi-scale feature maps extracted by different encoders. Specifically, dual encoders are employed to enhance feature distinguishability among different slices. One of the encoders is moving-averaged to maintain consistent slice representations. Moreover, an inter-slice fusion transformer (IF-Trans) module is developed to fuse inter-slice multi-scale features. MOSformer is evaluated on three benchmark datasets (Synapse, ACDC, and AMOS), achieving a new state-of-the-art with 85.63%, 92.19%, and 85.43% DSC, respectively. These results demonstrate MOSformer's competitiveness in medical image segmentation.

View on arXiv PDF

Similar