CV AIJan 20

Decoder-Free Supervoxel GNN for Accurate Brain-Tumor Localization in Multi-Modal MRI

Andrea Protani, Marc Molina Van Den Bosch, Lorenzo Giusti, Heloisa Barbosa Da Silva, Paolo Cacace, Albert Sund Aillet, Miguel Angel Gonzalez Ballester, Friedhelm Hummel, Luigi Serio

arXiv:2601.14055v12.81 citationsh-index: 4

Originality Highly original

AI Analysis

This work addresses the need for accurate and interpretable brain tumor localization in MRI for medical diagnosis, presenting a novel method that is incremental in improving efficiency over existing encoder-decoder structures.

The paper tackled the problem of inefficient parameter allocation in 3D medical imaging by introducing SVGFormer, a decoder-free pipeline using supervoxels and a hierarchical encoder, which achieved a F1-score of 0.875 for classification and a MAE of 0.028 for regression on the BraTS dataset.

Modern vision backbones for 3D medical imaging typically process dense voxel grids through parameter-heavy encoder-decoder structures, a design that allocates a significant portion of its parameters to spatial reconstruction rather than feature learning. Our approach introduces SVGFormer, a decoder-free pipeline built upon a content-aware grouping stage that partitions the volume into a semantic graph of supervoxels. Its hierarchical encoder learns rich node representations by combining a patch-level Transformer with a supervoxel-level Graph Attention Network, jointly modeling fine-grained intra-region features and broader inter-regional dependencies. This design concentrates all learnable capacity on feature encoding and provides inherent, dual-scale explainability from the patch to the region level. To validate the framework's flexibility, we trained two specialized models on the BraTS dataset: one for node-level classification and one for tumor proportion regression. Both models achieved strong performance, with the classification model achieving a F1-score of 0.875 and the regression model a MAE of 0.028, confirming the encoder's ability to learn discriminative and localized features. Our results establish that a graph-based, encoder-only paradigm offers an accurate and inherently interpretable alternative for 3D medical image representation.

View on arXiv PDF

Similar