MOSformer: Momentum encoder-based inter-slice fusion transformer for medical image segmentation
This work addresses a bottleneck in medical image segmentation for clinical applications by improving inter-slice fusion, though it is incremental over existing 2.5D approaches.
The paper tackled the problem of suboptimal segmentation performance in 2.5D-based medical image models by proposing MOSformer, which uses dual encoders and a fusion transformer to better integrate inter-slice information, achieving state-of-the-art results of 85.63%, 92.19%, and 85.43% DSC on three benchmark datasets.
Medical image segmentation takes an important position in various clinical applications. 2.5D-based segmentation models bridge the computational efficiency of 2D-based models with the spatial perception capabilities of 3D-based models. However, existing 2.5D-based models primarily adopt a single encoder to extract features of target and neighborhood slices, failing to effectively fuse inter-slice information, resulting in suboptimal segmentation performance. In this study, a novel momentum encoder-based inter-slice fusion transformer (MOSformer) is proposed to overcome this issue by leveraging inter-slice information from multi-scale feature maps extracted by different encoders. Specifically, dual encoders are employed to enhance feature distinguishability among different slices. One of the encoders is moving-averaged to maintain consistent slice representations. Moreover, an inter-slice fusion transformer (IF-Trans) module is developed to fuse inter-slice multi-scale features. MOSformer is evaluated on three benchmark datasets (Synapse, ACDC, and AMOS), achieving a new state-of-the-art with 85.63%, 92.19%, and 85.43% DSC, respectively. These results demonstrate MOSformer's competitiveness in medical image segmentation.