CVAIDec 10, 2025

MetaVoxel: Joint Diffusion Modeling of Imaging and Clinical Metadata

arXiv:2512.10041v1h-index: 19
Originality Incremental advance
AI Analysis

This work addresses the need for flexible, unified AI models in medical applications, though it is incremental as it builds on existing diffusion methods.

The authors tackled the problem of modeling separate conditional distributions for medical imaging and clinical metadata by introducing MetaVoxel, a joint diffusion modeling framework that unifies tasks like image generation, age estimation, and sex prediction, achieving performance comparable to task-specific baselines on over 10,000 MRI scans.

Modern deep learning methods have achieved impressive results across tasks from disease classification, estimating continuous biomarkers, to generating realistic medical images. Most of these approaches are trained to model conditional distributions defined by a specific predictive direction with a specific set of input variables. We introduce MetaVoxel, a generative joint diffusion modeling framework that models the joint distribution over imaging data and clinical metadata by learning a single diffusion process spanning all variables. By capturing the joint distribution, MetaVoxel unifies tasks that traditionally require separate conditional models and supports flexible zero-shot inference using arbitrary subsets of inputs without task-specific retraining. Using more than 10,000 T1-weighted MRI scans paired with clinical metadata from nine datasets, we show that a single MetaVoxel model can perform image generation, age estimation, and sex prediction, achieving performance comparable to established task-specific baselines. Additional experiments highlight its capabilities for flexible inference.Together, these findings demonstrate that joint multimodal diffusion offers a promising direction for unifying medical AI models and enabling broader clinical applicability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes