CVJan 7, 2025

CFFormer: Cross CNN-Transformer Channel Attention and Spatial Feature Fusion for Improved Segmentation of Heterogeneous Medical Images

Jiaxuan Li, Qing Xu, Xiangjian He, Ziyu Liu, Daokun Zhang, Ruili Wang, Rong Qu, Guoping Qiu

arXiv:2501.03629v211.8Has Code

Originality Incremental advance

AI Analysis

This addresses segmentation challenges in medical imaging for diagnosis, but is incremental as it builds on existing hybrid architectures.

The authors tackled medical image segmentation under heterogeneous image quality by proposing CFFormer, a hybrid CNN-Transformer model, which outperformed state-of-the-art methods across eight datasets covering five modalities.

Medical image segmentation plays an important role in computer-aided diagnosis. Existing methods mainly utilize spatial attention to highlight the region of interest. However, due to limitations of medical imaging devices, medical images exhibit significant heterogeneity, posing challenges for segmentation. Ultrasound images, for instance, often suffer from speckle noise, low resolution, and poor contrast between target tissues and background, which may lead to inaccurate boundary delineation. To address these challenges caused by heterogeneous image quality, we propose a hybrid CNN-Transformer model,called CFFormer, which leverages effective channel feature extraction to enhance the model' s ability to accurately identify tissue regions by capturing rich contextual information. The proposed architecture contains two key components: the Cross Feature Channel Attention (CFCA) module and the X-Spatial Feature Fusion (XFF) module. The model incorporates dual encoders, with the CNN encoder focusing on capturing local features and the Transformer encoder modeling global features. The CFCA module filters and facilitates interactions between the channel features from the two encoders, while the XFF module effectively reduces the significant semantic information differences in spatial features, enabling a smooth and cohesive spatial feature fusion. We evaluate our model across eight datasets covering five modalities to test its generalization capability. Experimental results demonstrate that our model outperforms current state-of-the-art methods and maintains accurate tissue region segmentation across heterogeneous medical image datasets. The code is available at https://github.com/JiaxuanFelix/CFFormer.

View on arXiv PDF Code

Similar