CVJan 26

Efficient Complex-Valued Vision Transformers for MRI Classification Directly from k-Space

arXiv:2601.18392v1h-index: 5
AI Analysis

This work addresses the need for resource-efficient AI analysis in MRI by enabling direct processing from scanners, though it is incremental in adapting existing architectures to a new domain.

The authors tackled the problem of MRI classification by proposing a complex-valued Vision Transformer that operates directly on raw k-space data, achieving competitive performance with state-of-the-art image-domain methods while reducing VRAM consumption by up to 68× during training.

Deep learning applications in Magnetic Resonance Imaging (MRI) predominantly operate on reconstructed magnitude images, a process that discards phase information and requires computationally expensive transforms. Standard neural network architectures rely on local operations (convolutions or grid-patches) that are ill-suited for the global, non-local nature of raw frequency-domain (k-Space) data. In this work, we propose a novel complex-valued Vision Transformer (kViT) designed to perform classification directly on k-Space data. To bridge the geometric disconnect between current architectures and MRI physics, we introduce a radial k-Space patching strategy that respects the spectral energy distribution of the frequency-domain. Extensive experiments on the fastMRI and in-house datasets demonstrate that our approach achieves classification performance competitive with state-of-the-art image-domain baselines (ResNet, EfficientNet, ViT). Crucially, kViT exhibits superior robustness to high acceleration factors and offers a paradigm shift in computational efficiency, reducing VRAM consumption during training by up to 68$\times$ compared to standard methods. This establishes a pathway for resource-efficient, direct-from-scanner AI analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes