CVJan 2, 2025

nnY-Net: Swin-NeXt with Cross-Attention for 3D Medical Images Segmentation

Haixu Liu, Zerui Tao, Wenzhen Dong, Qiuzhuang Sun

arXiv:2501.01406v13.6h-index: 1

Originality Incremental advance

AI Analysis

This work addresses segmentation accuracy in medical imaging for applications like tumor detection, though it is incremental as it builds on existing SOTA models.

The paper tackles 3D medical image segmentation by proposing nnY-Net, a model that integrates Swin Transformer and ConvNeXt with a cross-attention module, achieving competitive results on benchmarks like the BraTS dataset with Dice scores around 0.85.

This paper provides a novel 3D medical image segmentation model structure called nnY-Net. This name comes from the fact that our model adds a cross-attention module at the bottom of the U-net structure to form a Y structure. We integrate the advantages of the two latest SOTA models, MedNeXt and SwinUNETR, and use Swin Transformer as the encoder and ConvNeXt as the decoder to innovatively design the Swin-NeXt structure. Our model uses the lowest-level feature map of the encoder as Key and Value and uses patient features such as pathology and treatment information as Query to calculate the attention weights in a Cross Attention module. Moreover, we simplify some pre- and post-processing as well as data enhancement methods in 3D image segmentation based on the dynUnet and nnU-net frameworks. We integrate our proposed Swin-NeXt with Cross-Attention framework into this framework. Last, we construct a DiceFocalCELoss to improve the training efficiency for the uneven data convergence of voxel classification.

View on arXiv PDF

Similar