A Unified Framework for Multimodal Image Reconstruction and Synthesis using Denoising Diffusion Models
This addresses the need for simplified training and deployment workflows in multimodal imaging by replacing task-specific models with a single framework, though it is incremental as it builds on existing denoising diffusion models.
The paper tackles the problem of handling incomplete multimodal imaging data by introducing Any2all, a unified framework that formulates image reconstruction and synthesis as a single virtual inpainting problem, achieving competitive distortion-based performance and superior perceptual quality compared to specialized methods on a PET/MR/CT brain dataset.
Image reconstruction and image synthesis are important for handling incomplete multimodal imaging data, but existing methods require various task-specific models, complicating training and deployment workflows. We introduce Any2all, a unified framework that addresses this limitation by formulating these disparate tasks as a single virtual inpainting problem. We train a single, unconditional diffusion model on the complete multimodal data stack. This model is then adapted at inference time to ``inpaint'' all target modalities from any combination of inputs of available clean images or noisy measurements. We validated Any2all on a PET/MR/CT brain dataset. Our results show that Any2all can achieve excellent performance on both multimodal reconstruction and synthesis tasks, consistently yielding images with competitive distortion-based performance and superior perceptual quality over specialized methods.