CVAIApr 6

Modality-Aware and Anatomical Vector-Quantized Autoencoding for Multimodal Brain MRI

arXiv:2604.0517169.1h-index: 39
Predicted impact top 46% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the need for scalable generative modeling in medical image analysis by improving multimodal brain MRI reconstruction, though it is incremental as it builds on existing VAE and VQ-VAE methods.

The paper tackled the problem of learning robust multimodal brain MRI representations by proposing NeuroQuant, a modality-aware and anatomically grounded 3D vector-quantized VAE, which achieved superior reconstruction fidelity compared to existing VAEs on two datasets.

Learning a robust Variational Autoencoder (VAE) is a fundamental step for many deep learning applications in medical image analysis, such as MRI synthesizes. Existing brain VAEs predominantly focus on single-modality data (i.e., T1-weighted MRI), overlooking the complementary diagnostic value of other modalities like T2-weighted MRIs. Here, we propose a modality-aware and anatomically grounded 3D vector-quantized VAE (VQ-VAE) for reconstructing multi-modal brain MRIs. Called NeuroQuant, it first learns a shared latent representation across modalities using factorized multi-axis attention, which can capture relationships between distant brain regions. It then employs a dual-stream 3D encoder that explicitly separates the encoding of modality-invariant anatomical structures from modality-dependent appearance. Next, the anatomical encoding is discretized using a shared codebook and combined with modality-specific appearance features via Feature-wise Linear Modulation (FiLM) during the decoding phase. This entire approach is trained using a joint 2D/3D strategy in order to account for the slice-based acquisition of 3D MRI data. Extensive experiments on two multi-modal brain MRI datasets demonstrate that NeuroQuant achieves superior reconstruction fidelity compared to existing VAEs, enabling a scalable foundation for downstream generative modeling and cross-modal brain image analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes