IVSep 30, 2024
Volumetric Conditional Score-based Residual Diffusion Model for PET/MR DenoisingSiyeop Yoon, Rui Hu, Yuang Wang et al.
PET imaging is a powerful modality offering quantitative assessments of molecular and physiological processes. The necessity for PET denoising arises from the intrinsic high noise levels in PET imaging, which can significantly hinder the accurate interpretation and quantitative analysis of the scans. With advances in deep learning techniques, diffusion model-based PET denoising techniques have shown remarkable performance improvement. However, these models often face limitations when applied to volumetric data. Additionally, many existing diffusion models do not adequately consider the unique characteristics of PET imaging, such as its 3D volumetric nature, leading to the potential loss of anatomic consistency. Our Conditional Score-based Residual Diffusion (CSRD) model addresses these issues by incorporating a refined score function and 3D patch-wise training strategy, optimizing the model for efficient volumetric PET denoising. The CSRD model significantly lowers computational demands and expedites the denoising process. By effectively integrating volumetric data from PET and MRI scans, the CSRD model maintains spatial coherence and anatomical detail. Lastly, we demonstrate that the CSRD model achieves superior denoising performance in both qualitative and quantitative evaluations while maintaining image details and outperforms existing state-of-the-art methods.
CVSep 26, 2024
High-Fidelity 3D Lung CT Synthesis in ARDS Swine Models Using Score-Based 3D Residual Diffusion ModelsSiyeop Yoon, Yujin Oh, Xiang Li et al.
Acute respiratory distress syndrome (ARDS) is a severe condition characterized by lung inflammation and respiratory failure, with a high mortality rate of approximately 40%. Traditional imaging methods, such as chest X-rays, provide only two-dimensional views, limiting their effectiveness in fully assessing lung pathology. Three-dimensional (3D) computed tomography (CT) offers a more comprehensive visualization, enabling detailed analysis of lung aeration, atelectasis, and the effects of therapeutic interventions. However, the routine use of CT in ARDS management is constrained by practical challenges and risks associated with transporting critically ill patients to remote scanners. In this study, we synthesize high-fidelity 3D lung CT from 2D generated X-ray images with associated physiological parameters using a score-based 3D residual diffusion model. Our preliminary results demonstrate that this approach can produce high-quality 3D CT images that are validated with ground truth, offering a promising solution for enhancing ARDS management.
IVFeb 2, 2025Code
Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic PerspectiveYujin Oh, Pengfei Jin, Sangjoon Park et al.
Ensuring fairness in medical image segmentation is critical due to biases in imbalanced clinical data acquisition caused by demographic attributes (e.g., age, sex, race) and clinical factors (e.g., disease severity). To address these challenges, we introduce Distribution-aware Mixture of Experts (dMoE), inspired by optimal control theory. We provide a comprehensive analysis of its underlying mechanisms and clarify dMoE's role in adapting to heterogeneous distributions in medical image segmentation. Furthermore, we integrate dMoE into multiple network architectures, demonstrating its broad applicability across diverse medical image analysis tasks. By incorporating demographic and clinical factors, dMoE achieves state-of-the-art performance on two 2D benchmark datasets and a 3D in-house dataset. Our results highlight the effectiveness of dMoE in mitigating biases from imbalanced distributions, offering a promising approach to bridging control theory and medical image segmentation within fairness learning paradigms. The source code will be made available. The source code is available at https://github.com/tvseg/dMoE.
IVMar 10, 2024
Implicit Image-to-Image Schrodinger Bridge for Image RestorationYuang Wang, Siyeop Yoon, Pengfei Jin et al.
Diffusion-based models have demonstrated remarkable effectiveness in image restoration tasks; however, their iterative denoising process, which starts from Gaussian noise, often leads to slow inference speeds. The Image-to-Image Schrödinger Bridge (I$^2$SB) offers a promising alternative by initializing the generative process from corrupted images while leveraging training techniques from score-based diffusion models. In this paper, we introduce the Implicit Image-to-Image Schrödinger Bridge (I$^3$SB) to further accelerate the generative process of I$^2$SB. I$^3$SB restructures the generative process into a non-Markovian framework by incorporating the initial corrupted image at each generative step, effectively preserving and utilizing its information. To enable direct use of pretrained I$^2$SB models without additional training, we ensure consistency in marginal distributions. Extensive experiments across many image corruptions, including noise, low resolution, JPEG compression, and sparse sampling, and multiple image modalities, such as natural, human face, and medical images, demonstrate the acceleration benefits of I$^3$SB. Compared to I$^2$SB, I$^3$SB achieves the same perceptual quality with fewer generative steps, while maintaining or improving fidelity to the ground truth.
IVMar 11, 2024
Conditional Score-Based Diffusion Model for Cortical Thickness Trajectory PredictionQing Xiao, Siyeop Yoon, Hui Ren et al.
Alzheimer's Disease (AD) is a neurodegenerative condition characterized by diverse progression rates among individuals, with changes in cortical thickness (CTh) closely linked to its progression. Accurately forecasting CTh trajectories can significantly enhance early diagnosis and intervention strategies, providing timely care. However, the longitudinal data essential for these studies often suffer from temporal sparsity and incompleteness, presenting substantial challenges in modeling the disease's progression accurately. Existing methods are limited, focusing primarily on datasets without missing entries or requiring predefined assumptions about CTh progression. To overcome these obstacles, we propose a conditional score-based diffusion model specifically designed to generate CTh trajectories with the given baseline information, such as age, sex, and initial diagnosis. Our conditional diffusion model utilizes all available data during the training phase to make predictions based solely on baseline information during inference without needing prior history about CTh progression. The prediction accuracy of the proposed CTh prediction pipeline using a conditional score-based model was compared for sub-groups consisting of cognitively normal, mild cognitive impairment, and AD subjects. The Bland-Altman analysis shows our diffusion-based prediction model has a near-zero bias with narrow 95% confidential interval compared to the ground-truth CTh in 6-36 months. In addition, our conditional diffusion model has a stochastic generative nature, therefore, we demonstrated an uncertainty analysis of patient-specific CTh prediction through multiple realizations.
CVOct 26, 2025
Projection Embedded Diffusion Bridge for CT Reconstruction from Incomplete DataYuang Wang, Pengfei Jin, Siyeop Yoon et al.
Reconstructing CT images from incomplete projection data remains challenging due to the ill-posed nature of the problem. Diffusion bridge models have recently shown promise in restoring clean images from their corresponding Filtered Back Projection (FBP) reconstructions, but incorporating data consistency into these models remains largely underexplored. Incorporating data consistency can improve reconstruction fidelity by aligning the reconstructed image with the observed projection data, and can enhance detail recovery by integrating structural information contained in the projections. In this work, we propose the Projection Embedded Diffusion Bridge (PEDB). PEDB introduces a novel reverse stochastic differential equation (SDE) to sample from the distribution of clean images conditioned on both the FBP reconstruction and the incomplete projection data. By explicitly conditioning on the projection data in sampling the clean images, PEDB naturally incorporates data consistency. We embed the projection data into the score function of the reverse SDE. Under certain assumptions, we derive a tractable expression for the posterior score. In addition, we introduce a free parameter to control the level of stochasticity in the reverse process. We also design a discretization scheme for the reverse SDE to mitigate discretization error. Extensive experiments demonstrate that PEDB achieves strong performance in CT reconstruction from three types of incomplete data, including sparse-view, limited-angle, and truncated projections. For each of these types, PEDB outperforms evaluated state-of-the-art diffusion bridge models across standard, noisy, and domain-shift evaluations.
LGJun 30, 2025
System-Embedded Diffusion Bridge ModelsBartlomiej Sobieski, Matthew Tivnan, Yuang Wang et al.
Solving inverse problems -- recovering signals from incomplete or noisy measurements -- is fundamental in science and engineering. Score-based generative models (SGMs) have recently emerged as a powerful framework for this task. Two main paradigms have formed: unsupervised approaches that adapt pretrained generative models to inverse problems, and supervised bridge methods that train stochastic processes conditioned on paired clean and corrupted data. While the former typically assume knowledge of the measurement model, the latter have largely overlooked this structural information. We introduce System embedded Diffusion Bridge Models (SDBs), a new class of supervised bridge methods that explicitly embed the known linear measurement system into the coefficients of a matrix-valued SDE. This principled integration yields consistent improvements across diverse linear inverse problems and demonstrates robust generalization under system misspecification between training and deployment, offering a promising solution to real-world applications.
IVMay 28, 2025
Surf2CT: Cascaded 3D Flow Matching Models for Torso 3D CT Synthesis from Skin SurfaceSiyeop Yoon, Yujin Oh, Pengfei Jin et al.
We present Surf2CT, a novel cascaded flow matching framework that synthesizes full 3D computed tomography (CT) volumes of the human torso from external surface scans and simple demographic data (age, sex, height, weight). This is the first approach capable of generating realistic volumetric internal anatomy images solely based on external body shape and demographics, without any internal imaging. Surf2CT proceeds through three sequential stages: (1) Surface Completion, reconstructing a complete signed distance function (SDF) from partial torso scans using conditional 3D flow matching; (2) Coarse CT Synthesis, generating a low-resolution CT volume from the completed SDF and demographic information; and (3) CT Super-Resolution, refining the coarse volume into a high-resolution CT via a patch-wise conditional flow model. Each stage utilizes a 3D-adapted EDM2 backbone trained via flow matching. We trained our model on a combined dataset of 3,198 torso CT scans (approximately 1.13 million axial slices) sourced from Massachusetts General Hospital (MGH) and the AutoPET challenge. Evaluation on 700 paired torso surface-CT cases demonstrated strong anatomical fidelity: organ volumes exhibited small mean percentage differences (range from -11.1% to 4.4%), and muscle/fat body composition metrics matched ground truth with strong correlation (range from 0.67 to 0.96). Lung localization had minimal bias (mean difference -2.5 mm), and surface completion significantly improved metrics (Chamfer distance: from 521.8 mm to 2.7 mm; Intersection-over-Union: from 0.87 to 0.98). Surf2CT establishes a new paradigm for non-invasive internal anatomical imaging using only external data, opening opportunities for home-based healthcare, preventive medicine, and personalized clinical assessments without the risks associated with conventional imaging techniques.
IVMay 28, 2025
Cascaded 3D Diffusion Models for Whole-body 3D 18-F FDG PET/CT synthesis from DemographicsSiyeop Yoon, Sifan Song, Pengfei Jin et al.
We propose a cascaded 3D diffusion model framework to synthesize high-fidelity 3D PET/CT volumes directly from demographic variables, addressing the growing need for realistic digital twins in oncologic imaging, virtual trials, and AI-driven data augmentation. Unlike deterministic phantoms, which rely on predefined anatomical and metabolic templates, our method employs a two-stage generative process. An initial score-based diffusion model synthesizes low-resolution PET/CT volumes from demographic variables alone, providing global anatomical structures and approximate metabolic activity. This is followed by a super-resolution residual diffusion model that refines spatial resolution. Our framework was trained on 18-F FDG PET/CT scans from the AutoPET dataset and evaluated using organ-wise volume and standardized uptake value (SUV) distributions, comparing synthetic and real data between demographic subgroups. The organ-wise comparison demonstrated strong concordance between synthetic and real images. In particular, most deviations in metabolic uptake values remained within 3-5% of the ground truth in subgroup analysis. These findings highlight the potential of cascaded 3D diffusion models to generate anatomically and metabolically accurate PET/CT images, offering a robust alternative to traditional phantoms and enabling scalable, population-informed synthetic imaging for clinical and research applications.
CVMay 8, 2025
OWT: A Foundational Organ-Wise Tokenization Framework for Medical ImagingSifan Song, Siyeop Yoon, Pengfei Jin et al.
Recent advances in representation learning often rely on holistic embeddings that entangle multiple semantic components, limiting interpretability and generalization. These issues are especially critical in medical imaging, where downstream tasks depend on anatomically interpretable features. To address these limitations, we propose an Organ-Wise Tokenization (OWT) framework with a Token Group-based Reconstruction (TGR) training paradigm. Unlike conventional approaches, OWT explicitly disentangles an image into separable token groups, each corresponding to a distinct organ or semantic entity. Our design ensures each token group encapsulates organ-specific information, boosting interpretability, generalization, and efficiency while enabling fine-grained control for targeted clinical applications. Experiments on CT and MRI datasets demonstrate OWT's power: it not only achieves strong performance on standard tasks like image reconstruction and segmentation, but also unlocks novel, high-impact clinical capabilities including organ-specific tumor identification, organ-level retrieval and semantic-level generation, without requiring any additional training. These findings underscore the potential of OWT as a foundational framework for semantically disentangled representation learning, offering broad scalability and a new perspective on how representations can be leveraged.
IVMar 6, 2025
Prediction of Frozen Region Growth in Kidney Cryoablation Intervention Using a 3D Flow-Matching ModelSiyeop Yoon, Yujin Oh, Matthew Tivnan et al.
This study presents a 3D flow-matching model designed to predict the progression of the frozen region (iceball) during kidney cryoablation. Precise intraoperative guidance is critical in cryoablation to ensure complete tumor eradication while preserving adjacent healthy tissue. However, conventional methods, typically based on physics driven or diffusion based simulations, are computationally demanding and often struggle to represent complex anatomical structures accurately. To address these limitations, our approach leverages intraoperative CT imaging to inform the model. The proposed 3D flow matching model is trained to learn a continuous deformation field that maps early-stage CT scans to future predictions. This transformation not only estimates the volumetric expansion of the iceball but also generates corresponding segmentation masks, effectively capturing spatial and morphological changes over time. Quantitative analysis highlights the model robustness, demonstrating strong agreement between predictions and ground-truth segmentations. The model achieves an Intersection over Union (IoU) score of 0.61 and a Dice coefficient of 0.75. By integrating real time CT imaging with advanced deep learning techniques, this approach has the potential to enhance intraoperative guidance in kidney cryoablation, improving procedural outcomes and advancing the field of minimally invasive surgery.
CVFeb 13, 2025
Noise Controlled CT Super-Resolution with Conditional Diffusion ModelYuang Wang, Siyeop Yoon, Rui Hu et al.
Improving the spatial resolution of CT images is a meaningful yet challenging task, often accompanied by the issue of noise amplification. This article introduces an innovative framework for noise-controlled CT super-resolution utilizing the conditional diffusion model. The model is trained on hybrid datasets, combining noise-matched simulation data with segmented details from real data. Experimental results with real CT images validate the effectiveness of our proposed framework, showing its potential for practical applications in CT imaging.
IVMay 23, 2024
Measurement Embedded Schrödinger Bridge for Inverse ProblemsYuang Wang, Pengfei Jin, Siyeop Yoon et al.
Score-based diffusion models are frequently employed as structural priors in inverse problems. However, their iterative denoising process, initiated from Gaussian noise, often results in slow inference speeds. The Image-to-Image Schrödinger Bridge (I$^2$SB), which begins with the corrupted image, presents a promising alternative as a prior for addressing inverse problems. In this work, we introduce the Measurement Embedded Schrödinger Bridge (MESB). MESB establishes Schrödinger Bridges between the distribution of corrupted images and the distribution of clean images given observed measurements. Based on optimal transport theory, we derive the forward and backward processes of MESB. Through validation on diverse inverse problems, our proposed approach exhibits superior performance compared to existing Schrödinger Bridge-based inverse problems solvers in both visual quality and quantitative metrics.