CVMar 3, 2025

Primus: Enforcing Attention Usage for 3D Medical Image Segmentation

Tassilo Wald, Saikat Roy, Fabian Isensee, Constantin Ulrich, Sebastian Ziegler, Dasha Trofimova, Raphael Stock, Michael Baumgartner, Gregor Köhler, Klaus Maier-Hein

arXiv:2503.01835v122.820 citationsh-index: 65

Originality Incremental advance

AI Analysis

This addresses the problem of improving segmentation accuracy for medical imaging applications, representing an incremental step towards making Transformers state-of-the-art in this domain.

The paper tackled the limited impact of Transformers on 3D medical image segmentation by introducing Primus, a fully Transformer-based architecture that surpasses current Transformer methods and competes with state-of-the-art convolutional models on multiple public datasets.

Transformers have achieved remarkable success across multiple fields, yet their impact on 3D medical image segmentation remains limited with convolutional networks still dominating major benchmarks. In this work, we a) analyze current Transformer-based segmentation models and identify critical shortcomings, particularly their over-reliance on convolutional blocks. Further, we demonstrate that in some architectures, performance is unaffected by the absence of the Transformer, thereby demonstrating their limited effectiveness. To address these challenges, we move away from hybrid architectures and b) introduce a fully Transformer-based segmentation architecture, termed Primus. Primus leverages high-resolution tokens, combined with advances in positional embeddings and block design, to maximally leverage its Transformer blocks. Through these adaptations Primus surpasses current Transformer-based methods and competes with state-of-the-art convolutional models on multiple public datasets. By doing so, we create the first pure Transformer architecture and take a significant step towards making Transformers state-of-the-art for 3D medical image segmentation.

View on arXiv PDF

Similar