IVCVApr 12, 2025

Multi-Modal Brain Tumor Segmentation via 3D Multi-Scale Self-attention and Cross-attention

arXiv:2504.09088v111 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses the segmentation of 3D multi-modality brain tumors, which is crucial for medical diagnosis and treatment planning, but it is incremental as it builds on existing CNN-Transformer hybrid architectures.

The paper tackles the problem of 3D multi-modality brain tumor segmentation by proposing TMA-TransBTS, a CNN-Transformer hybrid model that addresses the limitation of fixed receptive fields in self-attention layers, achieving higher averaged segmentation results than previous state-of-the-art methods on three public datasets.

Due to the success of CNN-based and Transformer-based models in various computer vision tasks, recent works study the applicability of CNN-Transformer hybrid architecture models in 3D multi-modality medical segmentation tasks. Introducing Transformer brings long-range dependent information modeling ability in 3D medical images to hybrid models via the self-attention mechanism. However, these models usually employ fixed receptive fields of 3D volumetric features within each self-attention layer, ignoring the multi-scale volumetric lesion features. To address this issue, we propose a CNN-Transformer hybrid 3D medical image segmentation model, named TMA-TransBTS, based on an encoder-decoder structure. TMA-TransBTS realizes simultaneous extraction of multi-scale 3D features and modeling of long-distance dependencies by multi-scale division and aggregation of 3D tokens in a self-attention layer. Furthermore, TMA-TransBTS proposes a 3D multi-scale cross-attention module to establish a link between the encoder and the decoder for extracting rich volume representations by exploiting the mutual attention mechanism of cross-attention and multi-scale aggregation of 3D tokens. Extensive experimental results on three public 3D medical segmentation datasets show that TMA-TransBTS achieves higher averaged segmentation results than previous state-of-the-art CNN-based 3D methods and CNN-Transform hybrid 3D methods for the segmentation of 3D multi-modality brain tumors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes