3D Wavelet Latent Diffusion Model for Whole-Body MR-to-CT Modality Translation
This work addresses a critical bottleneck in clinical workflows by enhancing MR-to-CT translation for more accurate radiation attenuation estimation, though it appears incremental as it builds on existing diffusion models with specific architectural modifications.
The paper tackles the problem of poor spatial alignment and insufficient image quality in whole-body MR-to-CT synthesis for clinical applications like PET/MR imaging and radiation therapy, resulting in a novel 3D Wavelet Latent Diffusion Model that improves the representation of bony structures and soft-tissue contrast.
Magnetic Resonance (MR) imaging plays an essential role in contemporary clinical diagnostics. It is increasingly integrated into advanced therapeutic workflows, such as hybrid Positron Emission Tomography/Magnetic Resonance (PET/MR) imaging and MR-only radiation therapy. These integrated approaches are critically dependent on accurate estimation of radiation attenuation, which is typically facilitated by synthesizing Computed Tomography (CT) images from MR scans to generate attenuation maps. However, existing MR-to-CT synthesis methods for whole-body imaging often suffer from poor spatial alignment between the generated CT and input MR images, and insufficient image quality for reliable use in downstream clinical tasks. In this paper, we present a novel 3D Wavelet Latent Diffusion Model (3D-WLDM) that addresses these limitations by performing modality translation in a learned latent space. By incorporating a Wavelet Residual Module into the encoder-decoder architecture, we enhance the capture and reconstruction of fine-scale features across image and latent spaces. To preserve anatomical integrity during the diffusion process, we disentangle structural and modality-specific characteristics and anchor the structural component to prevent warping. We also introduce a Dual Skip Connection Attention mechanism within the diffusion model, enabling the generation of high-resolution CT images with improved representation of bony structures and soft-tissue contrast.