CVApr 23, 2024

CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection

arXiv:2404.15451v2h-index: 1IJCNN
Originality Incremental advance
AI Analysis

This work addresses efficiency and performance in medical image segmentation, though it appears incremental as it builds on existing transformer and feature pyramid concepts.

The authors tackled the problem of improving feature decoders for medical image segmentation by proposing CFPFormer, a transformer decoder with feature pyramids, which achieved 92.02% Dice Score on medical datasets and outperformed baselines with more complex backbones.

Feature pyramids have been widely adopted in convolutional neural networks and transformers for tasks in medical image segmentation. However, existing models generally focus on the Encoder-side Transformer for feature extraction. We further explore the potential in improving the feature decoder with a well-designed architecture. We propose Cross Feature Pyramid Transformer decoder (CFPFormer), a novel decoder block that integrates feature pyramids and transformers. Even though transformer-like architecture impress with outstanding performance in segmentation, the concerns to reduce the redundancy and training costs still exist. Specifically, by leveraging patch embedding, cross-layer feature concatenation mechanisms, CFPFormer enhances feature extraction capabilities while complexity issue is mitigated by our Gaussian Attention. Benefiting from Transformer structure and U-shaped connections, our work is capable of capturing long-range dependencies and effectively up-sample feature maps. Experimental results are provided to evaluate CFPFormer on medical image segmentation datasets, demonstrating the efficacy and effectiveness. With a ResNet50 backbone, our method achieves 92.02\% Dice Score, highlighting the efficacy of our methods. Notably, our VGG-based model outperformed baselines with more complex ViT and Swin Transformer backbone.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes