IVMar 18, 2022Code
Rethinking the optimization process for self-supervised model-driven MRI reconstructionWeijian Huang, Cheng Li, Wenxin Fan et al.
Recovering high-quality images from undersampled measurements is critical for accelerated MRI reconstruction. Recently, various supervised deep learning-based MRI reconstruction methods have been developed. Despite the achieved promising performances, these methods require fully sampled reference data, the acquisition of which is resource-intensive and time-consuming. Self-supervised learning has emerged as a promising solution to alleviate the reliance on fully sampled datasets. However, existing self-supervised methods suffer from reconstruction errors due to the insufficient constraint enforced on the non-sampled data points and the error accumulation happened alongside the iterative image reconstruction process for model-driven deep learning reconstrutions. To address these challenges, we propose K2Calibrate, a K-space adaptation strategy for self-supervised model-driven MR reconstruction optimization. By iteratively calibrating the learned measurements, K2Calibrate can reduce the network's reconstruction deterioration caused by statistically dependent noise. Extensive experiments have been conducted on the open-source dataset FastMRI, and K2Calibrate achieves better results than five state-of-the-art methods. The proposed K2Calibrate is plug-and-play and can be easily integrated with different model-driven deep learning reconstruction methods.
IVAug 11, 2024Code
TC-KANRecon: High-Quality and Accelerated MRI Reconstruction via Adaptive KAN Mechanisms and Intelligent Feature ScalingRuiquan Ge, Xiao Yu, Yifei Chen et al.
Magnetic Resonance Imaging (MRI) has become essential in clinical diagnosis due to its high resolution and multiple contrast mechanisms. However, the relatively long acquisition time limits its broader application. To address this issue, this study presents an innovative conditional guided diffusion model, named as TC-KANRecon, which incorporates the Multi-Free U-KAN (MF-UKAN) module and a dynamic clipping strategy. TC-KANRecon model aims to accelerate the MRI reconstruction process through deep learning methods while maintaining the quality of the reconstructed images. The MF-UKAN module can effectively balance the tradeoff between image denoising and structure preservation. Specifically, it presents the multi-head attention mechanisms and scalar modulation factors, which significantly enhances the model's robustness and structure preservation capabilities in complex noise environments. Moreover, the dynamic clipping strategy in TC-KANRecon adjusts the cropping interval according to the sampling steps, thereby mitigating image detail loss typicalching the visual features of the images. Furthermore, the MC-Model incorporates full-sampling k-space information, realizing efficient fusion of conditional information, enhancing the model's ability to process complex data, and improving the realism and detail richness of reconstructed images. Experimental results demonstrate that the proposed method outperforms other MRI reconstruction methods in both qualitative and quantitative evaluations. Notably, TC-KANRecon method exhibits excellent reconstruction results when processing high-noise, low-sampling-rate MRI data. Our source code is available at https://github.com/lcbkmm/TC-KANRecon.
IVNov 25, 2022
Generative Modeling in Sinogram Domain for Sparse-view CT ReconstructionBing Guan, Cailian Yang, Liu Zhang et al.
The radiation dose in computed tomography (CT) examinations is harmful for patients but can be significantly reduced by intuitively decreasing the number of projection views. Reducing projection views usually leads to severe aliasing artifacts in reconstructed images. Previous deep learning (DL) techniques with sparse-view data require sparse-view/full-view CT image pairs to train the network with supervised manners. When the number of projection view changes, the DL network should be retrained with updated sparse-view/full-view CT image pairs. To relieve this limitation, we present a fully unsupervised score-based generative model in sinogram domain for sparse-view CT reconstruction. Specifically, we first train a score-based generative model on full-view sinogram data and use multi-channel strategy to form highdimensional tensor as the network input to capture their prior distribution. Then, at the inference stage, the stochastic differential equation (SDE) solver and data-consistency step were performed iteratively to achieve fullview projection. Filtered back-projection (FBP) algorithm was used to achieve the final image reconstruction. Qualitative and quantitative studies were implemented to evaluate the presented method with several CT data. Experimental results demonstrated that our method achieved comparable or better performance than the supervised learning counterparts.
IVMar 21, 2022
K-space and Image Domain Collaborative Energy based Model for Parallel MRI ReconstructionZongjiang Tu, Chen Jiang, Yu Guan et al.
Decreasing magnetic resonance (MR) image acquisition times can potentially make MR examinations more accessible. Prior arts including the deep learning models have been devoted to solving the problem of long MRI imaging time. Recently, deep generative models have exhibited great potentials in algorithm robustness and usage flexibility. Nevertheless, none of existing schemes can be learned or employed to the k-space measurement directly. Furthermore, how do the deep generative models work well in hybrid domain is also worth being investigated. In this work, by taking advantage of the deep energy-based models, we propose a k-space and image domain collaborative generative model to comprehensively estimate the MR data from under-sampled measurement. Experimental comparisons with the state-of-the-arts demonstrated that the proposed hybrid method has less error in reconstruction accuracy and is more stable under different acceleration factors
IVAug 30, 2023
Stage-by-stage Wavelet Optimization Refinement Diffusion Model for Sparse-View CT ReconstructionKai Xu, Shiyu Lu, Bin Huang et al.
Diffusion models have emerged as potential tools to tackle the challenge of sparse-view CT reconstruction, displaying superior performance compared to conventional methods. Nevertheless, these prevailing diffusion models predominantly focus on the sinogram or image domains, which can lead to instability during model training, potentially culminating in convergence towards local minimal solutions. The wavelet trans-form serves to disentangle image contents and features into distinct frequency-component bands at varying scales, adeptly capturing diverse directional structures. Employing the Wavelet transform as a guiding sparsity prior significantly enhances the robustness of diffusion models. In this study, we present an innovative approach named the Stage-by-stage Wavelet Optimization Refinement Diffusion (SWORD) model for sparse-view CT reconstruction. Specifically, we establish a unified mathematical model integrating low-frequency and high-frequency generative models, achieving the solution with optimization procedure. Furthermore, we perform the low-frequency and high-frequency generative models on wavelet's decomposed components rather than sinogram or image domains, ensuring the stability of model training. Our method rooted in established optimization theory, comprising three distinct stages, including low-frequency generation, high-frequency refinement and domain transform. Our experimental results demonstrate that the proposed method outperforms existing state-of-the-art methods both quantitatively and qualitatively.
IVMay 8, 2022
WKGM: Weight-K-space Generative Model for Parallel Imaging ReconstructionZongjiang Tu, Die Liu, Xiaoqing Wang et al.
Deep learning based parallel imaging (PI) has made great progresses in recent years to accelerate magnetic resonance imaging (MRI). Nevertheless, it still has some limitations, such as the robustness and flexibility of existing methods have great deficiency. In this work, we propose a method to explore the k-space domain learning via robust generative modeling for flexible calibration-less PI reconstruction, coined weight-k-space generative model (WKGM). Specifically, WKGM is a generalized k-space domain model, where the k-space weighting technology and high-dimensional space augmentation design are efficiently incorporated for score-based generative model training, resulting in good and robust reconstructions. In addition, WKGM is flexible and thus can be synergistically combined with various traditional k-space PI models, which can make full use of the correlation between multi-coil data and realizecalibration-less PI. Even though our model was trained on only 500 images, experimental results with varying sampling patterns and acceleration factors demonstrate that WKGM can attain state-of-the-art reconstruction results with the well-learned k-space generative prior.
IVAug 15, 2022
One-shot Generative Prior in Hankel-k-space for Parallel Imaging ReconstructionHong Peng, Chen Jiang, Jing Cheng et al.
Magnetic resonance imaging serves as an essential tool for clinical diagnosis. However, it suffers from a long acquisition time. The utilization of deep learning, especially the deep generative models, offers aggressive acceleration and better reconstruction in magnetic resonance imaging. Nevertheless, learning the data distribution as prior knowledge and reconstructing the image from limited data remains challenging. In this work, we propose a novel Hankel-k-space generative model (HKGM), which can generate samples from a training set of as little as one k-space data. At the prior learning stage, we first construct a large Hankel matrix from k-space data, then extract multiple structured k-space patches from the large Hankel matrix to capture the internal distribution among different patches. Extracting patches from a Hankel matrix enables the generative model to be learned from redundant and low-rank data space. At the iterative reconstruction stage, it is observed that the desired solution obeys the learned prior knowledge. The intermediate reconstruction solution is updated by taking it as the input of the generative model. The updated result is then alternatively operated by imposing low-rank penalty on its Hankel matrix and data consistency con-strain on the measurement data. Experimental results confirmed that the internal statistics of patches within a single k-space data carry enough information for learning a powerful generative model and provide state-of-the-art reconstruction.
IVDec 7, 2022
One Sample Diffusion Model in Projection Domain for Low-Dose CT ImagingBin Huang, Liu Zhang, Shiyu Lu et al.
Low-dose computed tomography (CT) plays a significant role in reducing the radiation risk in clinical applications. However, lowering the radiation dose will significantly degrade the image quality. With the rapid development and wide application of deep learning, it has brought new directions for the development of low-dose CT imaging algorithms. Therefore, we propose a fully unsupervised one sample diffusion model (OSDM)in projection domain for low-dose CT reconstruction. To extract sufficient prior information from single sample, the Hankel matrix formulation is employed. Besides, the penalized weighted least-squares and total variation are introduced to achieve superior image quality. Specifically, we first train a score-based generative model on one sinogram by extracting a great number of tensors from the structural-Hankel matrix as the network input to capture prior distribution. Then, at the inference stage, the stochastic differential equation solver and data consistency step are performed iteratively to obtain the sinogram data. Finally, the final image is obtained through the filtered back-projection algorithm. The reconstructed results are approaching to the normal-dose counterparts. The results prove that OSDM is practical and effective model for reducing the artifacts and preserving the image quality.
CVMay 11Code
FrequencyCT: Frequency domain pseudo-label generation for self-supervised low-dose CT denoisingGuoquan Wei, Liu Shi, Chong Chen et al.
Despite extensive research on computed tomography (CT) denoising, few studies exploit projection-domain data characteristics to mitigate noise correlation. To address this, this work proposes FrequencyCT, the first zero-shot self-supervised method for pseudo-label generation in the frequency domain for low-dose CT denoising. Leveraging the characteristic of the frequency domain that largely isolates noise from clean signals, a regional low-frequency anchoring technique is proposed. Phase-preserving amplitude modulation and mask perturbation in the high-frequency region generate pseudo-label data for self-supervision. The fluctuating noise variance in the projection domain prompts truncation of the generated samples to stabilize the network's optimization gradient. Evaluation results on multiple public and real-world datasets confirm the clinical application potential of this research, which will have a revolutionary impact on the field of denoising. The code can be obtained from https://github.com/yqx7150/FrequencyCT.
CVAug 30, 2023
Physics-Informed DeepMRI: Bridging the Gap from Heat Diffusion to k-Space InterpolationZhuo-Xu Cui, Congcong Liu, Xiaohong Fan et al.
In the field of parallel imaging (PI), alongside image-domain regularization methods, substantial research has been dedicated to exploring $k$-space interpolation. However, the interpretability of these methods remains an unresolved issue. Furthermore, these approaches currently face acceleration limitations that are comparable to those experienced by image-domain methods. In order to enhance interpretability and overcome the acceleration limitations, this paper introduces an interpretable framework that unifies both $k$-space interpolation techniques and image-domain methods, grounded in the physical principles of heat diffusion equations. Building upon this foundational framework, a novel $k$-space interpolation method is proposed. Specifically, we model the process of high-frequency information attenuation in $k$-space as a heat diffusion equation, while the effort to reconstruct high-frequency information from low-frequency regions can be conceptualized as a reverse heat equation. However, solving the reverse heat equation poses a challenging inverse problem. To tackle this challenge, we modify the heat equation to align with the principles of magnetic resonance PI physics and employ the score-based generative method to precisely execute the modified reverse heat diffusion. Finally, experimental validation conducted on publicly available datasets demonstrates the superiority of the proposed approach over traditional $k$-space interpolation methods, deep learning-based $k$-space interpolation methods, and conventional diffusion models in terms of reconstruction accuracy, particularly in high-frequency regions.
IVApr 5, 2022
Multi-Weight Respecification of Scan-specific Learning for Parallel ImagingHui Tao, Haifeng Wang, Shanshan Wang et al.
Parallel imaging is widely used in magnetic resonance imaging as an acceleration technology. Traditional linear reconstruction methods in parallel imaging often suffer from noise amplification. Recently, a non-linear robust artificial-neural-network for k-space interpolation (RAKI) exhibits superior noise resilience over other linear methods. However, RAKI performs poorly at high acceleration rates, and needs a large amount of autocalibration signals as the training samples. In order to tackle these issues, we propose a multi-weight method that implements multiple weighting matrices on the undersampled data, named as MW-RAKI. Enforcing multiple weighted matrices on the measurements can effectively reduce the influence of noise and increase the data constraints. Furthermore, we incorporate the strategy of multiple weighting matrixes into a residual version of RAKI, and form MW-rRAKI.Experimental compari-sons with the alternative methods demonstrated noticeably better reconstruction performances, particularly at high acceleration rates.
IVDec 15, 2022
Universal Generative Modeling in Dual-domain for Dynamic MR ImagingChuanming Yu, Yu Guan, Ziwen Ke et al.
Dynamic magnetic resonance image reconstruction from incomplete k-space data has generated great research interest due to its capability to reduce scan time. Never-theless, the reconstruction problem is still challenging due to its ill-posed nature. Recently, diffusion models espe-cially score-based generative models have exhibited great potential in algorithm robustness and usage flexi-bility. Moreover, the unified framework through the variance exploding stochastic differential equation (VE-SDE) is proposed to enable new sampling methods and further extend the capabilities of score-based gener-ative models. Therefore, by taking advantage of the uni-fied framework, we proposed a k-space and image Du-al-Domain collaborative Universal Generative Model (DD-UGM) which combines the score-based prior with low-rank regularization penalty to reconstruct highly under-sampled measurements. More precisely, we extract prior components from both image and k-space domains via a universal generative model and adaptively handle these prior components for faster processing while maintaining good generation quality. Experimental comparisons demonstrated the noise reduction and detail preservation abilities of the proposed method. Much more than that, DD-UGM can reconstruct data of differ-ent frames by only training a single frame image, which reflects the flexibility of the proposed model.
CVDec 11, 2022
Low-rank Tensor Assisted K-space Generative Model for Parallel Imaging ReconstructionWei Zhang, Zengwei Xiao, Hui Tao et al.
Although recent deep learning methods, especially generative models, have shown good performance in fast magnetic resonance imaging, there is still much room for improvement in high-dimensional generation. Considering that internal dimensions in score-based generative models have a critical impact on estimating the gradient of the data distribution, we present a new idea, low-rank tensor assisted k-space generative model (LR-KGM), for parallel imaging reconstruction. This means that we transform original prior information into high-dimensional prior information for learning. More specifically, the multi-channel data is constructed into a large Hankel matrix and the matrix is subsequently folded into tensor for prior learning. In the testing phase, the low-rank rotation strategy is utilized to impose low-rank constraints on tensor output of the generative network. Furthermore, we alternately use traditional generative iterations and low-rank high-dimensional tensor iterations for reconstruction. Experimental comparisons with the state-of-the-arts demonstrated that the proposed LR-KGM method achieved better performance.
IVJan 30Code
Visible Singularities Guided Correlation Network for Limited-Angle CT ReconstructionYiyang Wen, Liu Shi, Zekun Zhou et al.
Limited-angle computed tomography (LACT) offers the advantages of reduced radiation dose and shortened scanning time. Traditional reconstruction algorithms exhibit various inherent limitations in LACT. Currently, most deep learning-based LACT reconstruction methods focus on multi-domain fusion or the introduction of generic priors, failing to fully align with the core imaging characteristics of LACT-such as the directionality of artifacts and directional loss of structural information, which are caused by the absence of projection angles in certain directions. Inspired by the theory of visible and invisible singularities, taking into account the aforementioned core imaging characteristics of LACT, we propose a Visible Singularities Guided Correlation network for LACT reconstruction (VSGC). The design philosophy of VSGC consists of two core steps: First, extract VS edge features from LACT images and focus the model's attention on these VS. Second, establish correlations between the VS edge features and other regions of the image. Additionally, a multi-scale loss function with anisotropic constraint is employed to constrain the model to converge in multiple aspects. Finally, qualitative and quantitative validations are conducted on both simulated and real datasets to verify the effectiveness and feasibility of the proposed design. Particularly, in comparison with alternative methods, VSGC delivers more prominent performance in small angular ranges, with the PSNR improvement of 2.45 dB and the SSIM enhancement of 1.5\%. The code is publicly available at https://github.com/yqx7150/VSGC.
CVFeb 6Code
TLC-Plan: A Two-Level Codebook Based Network for End-to-End Vector Floorplan GenerationBiao Xiong, Zhen Peng, Ping Wang et al.
Automated floorplan generation aims to improve design quality, architectural efficiency, and sustainability by jointly modeling global spatial organization and precise geometric detail. However, existing approaches operate in raster space and rely on post hoc vectorization, which introduces structural inconsistencies and hinders end-to-end learning. Motivated by compositional spatial reasoning, we propose TLC-Plan, a hierarchical generative model that directly synthesizes vector floorplans from input boundaries, aligning with human architectural workflows based on modular and reusable patterns. TLC-Plan employs a two-level VQ-VAE to encode global layouts as semantically labeled room bounding boxes and to refine local geometries using polygon-level codes. This hierarchy is unified in a CodeTree representation, while an autoregressive transformer samples codes conditioned on the boundary to generate diverse and topologically valid designs, without requiring explicit room topology or dimensional priors. Extensive experiments show state-of-the-art performance on RPLAN dataset (FID = 1.84, MSE = 2.06) and leading results on LIFULL dataset. The proposed framework advances constraint-aware and scalable vector floorplan generation for real-world architectural applications. Source code and trained models are released at https://github.com/rosolose/TLC-PLAN.
CVJan 12Code
Anatomy Aware Cascade Network: Bridging Epistemic Uncertainty and Geometric Manifold for 3D Tooth SegmentationBing Yu, Liu Shi, Haitao Wang et al.
Accurate three-dimensional (3D) tooth segmentation from Cone-Beam Computed Tomography (CBCT) is a prerequisite for digital dental workflows. However, achieving high-fidelity segmentation remains challenging due to adhesion artifacts in naturally occluded scans, which are caused by low contrast and indistinct inter-arch boundaries. To address these limitations, we propose the Anatomy Aware Cascade Network (AACNet), a coarse-to-fine framework designed to resolve boundary ambiguity while maintaining global structural consistency. Specifically, we introduce two mechanisms: the Ambiguity Gated Boundary Refiner (AGBR) and the Signed Distance Map guided Anatomical Attention (SDMAA). The AGBR employs an entropy based gating mechanism to perform targeted feature rectification in high uncertainty transition zones. Meanwhile, the SDMAA integrates implicit geometric constraints via signed distance map to enforce topological consistency, preventing the loss of spatial details associated with standard pooling. Experimental results on a dataset of 125 CBCT volumes demonstrate that AACNet achieves a Dice Similarity Coefficient of 90.17 \% and a 95\% Hausdorff Distance of 3.63 mm, significantly outperforming state-of-the-art methods. Furthermore, the model exhibits strong generalization on an external dataset with an HD95 of 2.19 mm, validating its reliability for downstream clinical applications such as surgical planning. Code for AACNet is available at https://github.com/shiliu0114/AACNet.
CVNov 25, 2022
Generative Modeling in Structural-Hankel Domain for Color Image InpaintingZihao Li, Chunhua Wu, Shenglin Wu et al.
In recent years, some researchers focused on using a single image to obtain a large number of samples through multi-scale features. This study intends to a brand-new idea that requires only ten or even fewer samples to construct the low-rank structural-Hankel matrices-assisted score-based generative model (SHGM) for color image inpainting task. During the prior learning process, a certain amount of internal-middle patches are firstly extracted from several images and then the structural-Hankel matrices are constructed from these patches. To better apply the score-based generative model to learn the internal statistical distribution within patches, the large-scale Hankel matrices are finally folded into the higher dimensional tensors for prior learning. During the iterative inpainting process, SHGM views the inpainting problem as a conditional generation procedure in low-rank environment. As a result, the intermediate restored image is acquired by alternatively performing the stochastic differential equation solver, alternating direction method of multipliers, and data consistency steps. Experimental results demonstrated the remarkable performance and diversity of SHGM.
LGNov 1, 2023
PET Tracer Conversion among Brain PET via Variable Augmented Invertible NetworkBohui Shen, Wei Zhang, Xubiao Liu et al.
Positron emission tomography (PET) serves as an essential tool for diagnosis of encephalopathy and brain science research. However, it suffers from the limited choice of tracers. Nowadays, with the wide application of PET imaging in neuropsychiatric treatment, 6-18F-fluoro-3, 4-dihydroxy-L-phenylalanine (DOPA) has been found to be more effective than 18F-labeled fluorine-2-deoxyglucose (FDG) in the field. Nevertheless, due to the complexity of its preparation and other limitations, DOPA is far less widely used than FDG. To address this issue, a tracer conversion invertible neural network (TC-INN) for image projection is developed to map FDG images to DOPA images through deep learning. More diagnostic information is obtained by generating PET images from FDG to DOPA. Specifically, the proposed TC-INN consists of two separate phases, one for training traceable data, the other for rebuilding new data. The reference DOPA PET image is used as a learning target for the corresponding network during the training process of tracer conversion. Meanwhile, the invertible network iteratively estimates the resultant DOPA PET data and compares it to the reference DOPA PET data. Notably, the reversible model employs variable enhancement technique to achieve better power generation. Moreover, image registration needs to be performed before training due to the angular deviation of the acquired FDG and DOPA data information. Experimental results exhibited excellent generation capability in mapping between FDG and DOPA, suggesting that PET tracer conversion has great potential in the case of limited tracer applications.
IVOct 3, 2023
Synthetic CT Generation via Variant Invertible Network for All-digital Brain PET Attenuation CorrectionYu Guan, Bohui Shen, Xinchong Shi et al.
Attenuation correction (AC) is essential for the generation of artifact-free and quantitatively accurate positron emission tomography (PET) images. However, AC of PET faces challenges including inter-scan motion and erroneous transformation of structural voxel-intensities to PET attenuation-correction factors. Nowadays, the problem of AC for quantitative PET have been solved to a large extent after the commercial availability of devices combining PET with computed tomography (CT). Meanwhile, considering the feasibility of a deep learning approach for PET AC without anatomical imaging, this paper develops a PET AC method, which uses deep learning to generate continuously valued CT images from non-attenuation corrected PET images for AC on brain PET imaging. Specifically, an invertible network combined with the variable augmentation strategy that can achieve the bidirectional inference processes is proposed for synthetic CT generation (IVNAC). To evaluate the performance of the proposed algorithm, we conducted a comprehensive study on a total of 1440 data from 37 clinical patients using comparative algorithms (such as Cycle-GAN and Pix2pix). Perceptual analysis and quantitative evaluations illustrate that the invertible network for PET AC outperforms other existing AC models, which demonstrates the potential of the proposed method and the feasibility of achieving brain PET AC without CT.
CVFeb 12
PLOT-CT: Pre-log Voronoi Decomposition Assisted Generation for Low-dose CT ReconstructionBin Huang, Xun Yu, Yikun Zhang et al.
Low-dose computed tomography (LDCT) reconstruction is fundamentally challenged by severe noise and compromised data fidelity under reduced radiation exposure. Most existing methods operate either in the image or post-log projection domain, which fails to fully exploit the rich structural information in pre-log measurements while being highly susceptible to noise. The requisite logarithmic transformation critically amplifies noise within these data, imposing exceptional demands on reconstruction precision. To overcome these challenges, we propose PLOT-CT, a novel framework for Pre-Log vOronoi decomposiTion-assisted CT generation. Our method begins by applying Voronoi decomposition to pre-log sinograms, disentangling the data into distinct underlying components, which are embedded in separate latent spaces. This explicit decomposition significantly enhances the model's capacity to learn discriminative features, directly improving reconstruction accuracy by mitigating noise and preserving information inherent in the pre-log domain. Extensive experiments demonstrate that PLOT-CT achieves state-of-the-art performance, attaining a 2.36dB PSNR improvement over traditional methods at the 1e4 incident photon level in the pre-log domain.
CVJul 1, 2024
Diffusion Transformer Model With Compact Prior for Low-dose PET ReconstructionBin Huang, Xubiao Liu, Lei Fang et al.
Positron emission tomography (PET) is an advanced medical imaging technique that plays a crucial role in non-invasive clinical diagnosis. However, while reducing radiation exposure through low-dose PET scans is beneficial for patient safety, it often results in insufficient statistical data. This scarcity of data poses significant challenges for accurately reconstructing high-quality images, which are essential for reliable diagnostic outcomes. In this research, we propose a diffusion transformer model (DTM) guided by joint compact prior (JCP) to enhance the reconstruction quality of low-dose PET imaging. In light of current research findings, we present a pioneering PET reconstruction model that integrates diffusion and transformer models for joint optimization. This model combines the powerful distribution mapping abilities of diffusion models with the capacity of transformers to capture long-range dependencies, offering significant advantages for low-dose PET reconstruction. Additionally, the incorporation of the lesion refining block and penalized weighted least squares (PWLS) enhance the recovery capability of lesion regions and preserves detail information, solving blurring problems in lesion areas and texture details of most deep learning frameworks. Experimental results demonstrate the effectiveness of DTM in enhancing image quality and preserving critical clinical information for low-dose PET scans. Our approach not only reduces radiation exposure risks but also provides a more reliable PET imaging tool for early disease detection and patient management.
LGNov 8, 2024Code
RED: Residual Estimation Diffusion for Low-Dose PET Sinogram ReconstructionXingyu Ai, Bin Huang, Fang Chen et al.
Recent advances in diffusion models have demonstrated exceptional performance in generative tasks across vari-ous fields. In positron emission tomography (PET), the reduction in tracer dose leads to information loss in sino-grams. Using diffusion models to reconstruct missing in-formation can improve imaging quality. Traditional diffu-sion models effectively use Gaussian noise for image re-constructions. However, in low-dose PET reconstruction, Gaussian noise can worsen the already sparse data by introducing artifacts and inconsistencies. To address this issue, we propose a diffusion model named residual esti-mation diffusion (RED). From the perspective of diffusion mechanism, RED uses the residual between sinograms to replace Gaussian noise in diffusion process, respectively sets the low-dose and full-dose sinograms as the starting point and endpoint of reconstruction. This mechanism helps preserve the original information in the low-dose sinogram, thereby enhancing reconstruction reliability. From the perspective of data consistency, RED introduces a drift correction strategy to reduce accumulated prediction errors during the reverse process. Calibrating the inter-mediate results of reverse iterations helps maintain the data consistency and enhances the stability of reconstruc-tion process. Experimental results show that RED effec-tively improves the quality of low-dose sinograms as well as the reconstruction results. The code is available at: https://github.com/yqx7150/RED.
CVMar 11, 2025Code
Diffusion Transformer Meets Random Masks: An Advanced PET Reconstruction FrameworkBin Huang, Binzhong He, Yanhan Chen et al.
Deep learning has significantly advanced PET image re-construction, achieving remarkable improvements in image quality through direct training on sinogram or image data. Traditional methods often utilize masks for inpainting tasks, but their incorporation into PET reconstruction frameworks introduces transformative potential. In this study, we pro-pose an advanced PET reconstruction framework called Diffusion tRansformer mEets rAndom Masks (DREAM). To the best of our knowledge, this is the first work to integrate mask mechanisms into both the sinogram domain and the latent space, pioneering their role in PET reconstruction and demonstrating their ability to enhance reconstruction fidelity and efficiency. The framework employs a high-dimensional stacking approach, transforming masked data from two to three dimensions to expand the solution space and enable the model to capture richer spatial rela-tionships. Additionally, a mask-driven latent space is de-signed to accelerate the diffusion process by leveraging sinogram-driven and mask-driven compact priors, which reduce computational complexity while preserving essen-tial data characteristics. A hierarchical masking strategy is also introduced, guiding the model from focusing on fi-ne-grained local details in the early stages to capturing broader global patterns over time. This progressive ap-proach ensures a balance between detailed feature preservation and comprehensive context understanding. Experimental results demonstrate that DREAM not only improves the overall quality of reconstructed PET images but also preserves critical clinical details, highlighting its potential to advance PET imaging technology. By inte-grating compact priors and hierarchical masking, DREAM offers a promising and efficient avenue for future research and application in PET imaging. The open-source code is available at: https://github.com/yqx7150/DREAM.
IVFeb 6
AS-Mamba: Asymmetric Self-Guided Mamba Decoupled Iterative Network for Metal Artifact ReductionBowen Ning, Zekun Zhou, Xinyi Zhong et al.
Metal artifact significantly degrades Computed Tomography (CT) image quality, impeding accurate clinical diagnosis. However, existing deep learning approaches, such as CNN and Transformer, often fail to explicitly capture the directional geometric features of artifacts, leading to compromised structural restoration. To address these limitations, we propose the Asymmetric Self-Guided Mamba (AS-Mamba) for metal artifact reduction. Specifically, the linear propagation of metal-induced streak artifacts aligns well with the sequential modeling capability of State Space Models (SSMs). Consequently, the Mamba architecture is leveraged to explicitly capture and suppress these directional artifacts. Simultaneously, a frequency domain correction mechanism is incorporated to rectify the global amplitude spectrum, thereby mitigating intensity inhomogeneity caused by beam hardening. Furthermore, to bridge the distribution gap across diverse clinical scenarios, we introduce a self-guided contrastive regularization strategy. Extensive experiments on public andclinical dental CBCT datasets demonstrate that AS-Mamba achieves superior performance in suppressing directional streaks and preserving structural details, validating the effectiveness of integrating physical geometric priors into deep network design.
CVAug 25, 2025Code
UniSino: Physics-Driven Foundational Model for Universal CT Sinogram StandardizationXingyu Ai, Shaoyu Wang, Zhiyuan Jia et al.
During raw-data acquisition in CT imaging, diverse factors can degrade the collected sinograms, with undersampling and noise leading to severe artifacts and noise in reconstructed images and compromising diagnostic accuracy. Conventional correction methods rely on manually designed algorithms or fixed empirical parameters, but these approaches often lack generalizability across heterogeneous artifact types. To address these limitations, we propose UniSino, a foundation model for universal CT sinogram standardization. Unlike existing foundational models that operate in image domain, UniSino directly standardizes data in the projection domain, which enables stronger generalization across diverse undersampling scenarios. Its training framework incorporates the physical characteristics of sinograms, enhancing generalization and enabling robust performance across multiple subtasks spanning four benchmark datasets. Experimental results demonstrate thatUniSino achieves superior reconstruction quality both single and mixed undersampling case, demonstrating exceptional robustness and generalization in sinogram enhancement for CT imaging. The code is available at: https://github.com/yqx7150/UniSino.
CVJul 14, 2025Code
LifelongPR: Lifelong point cloud place recognition based on sample replay and prompt learningXianghong Zou, Jianping Li, Zhe Chen et al.
Point cloud place recognition (PCPR) determines the geo-location within a prebuilt map and plays a crucial role in geoscience and robotics applications such as autonomous driving, intelligent transportation, and augmented reality. In real-world large-scale deployments of a geographic positioning system, PCPR models must continuously acquire, update, and accumulate knowledge to adapt to diverse and dynamic environments, i.e., the ability known as continual learning (CL). However, existing PCPR models often suffer from catastrophic forgetting, leading to significant performance degradation in previously learned scenes when adapting to new environments or sensor types. This results in poor model scalability, increased maintenance costs, and system deployment difficulties, undermining the practicality of PCPR. To address these issues, we propose LifelongPR, a novel continual learning framework for PCPR, which effectively extracts and fuses knowledge from sequential point cloud data. First, to alleviate the knowledge loss, we propose a replay sample selection method that dynamically allocates sample sizes according to each dataset's information quantity and selects spatially diverse samples for maximal representativeness. Second, to handle domain shifts, we design a prompt learning-based CL framework with a lightweight prompt module and a two-stage training strategy, enabling domain-specific feature adaptation while minimizing forgetting. Comprehensive experiments on large-scale public and self-collected datasets are conducted to validate the effectiveness of the proposed method. Compared with state-of-the-art (SOTA) methods, our method achieves 6.50% improvement in mIR@1, 7.96% improvement in mR@1, and an 8.95% reduction in F. The code and pre-trained models are publicly available at https://github.com/zouxianghong/LifelongPR.
IVJun 20, 2025Code
PET Tracer Separation Using Conditional Diffusion Transformer with Multi-latent Space LearningBin Huang, Feihong Xu, Xinchong Shi et al.
In clinical practice, single-radiotracer positron emission tomography (PET) is commonly used for imaging. Although multi-tracer PET imaging can provide supplementary information of radiotracers that are sensitive to physiological function changes, enabling a more comprehensive characterization of physiological and pathological states, the gamma-photon pairs generated by positron annihilation reactions of different tracers in PET imaging have the same energy, making it difficult to distinguish the tracer signals. In this study, a multi-latent space guided texture conditional diffusion transformer model (MS-CDT) is proposed for PET tracer separation. To the best of our knowledge, this is the first attempt to use texture condition and multi-latent space for tracer separation in PET imaging. The proposed model integrates diffusion and transformer architectures into a unified optimization framework, with the novel addition of texture masks as conditional inputs to enhance image details. By leveraging multi-latent space prior derived from different tracers, the model captures multi-level feature representations, aiming to balance computational efficiency and detail preservation. The texture masks, serving as conditional guidance, help the model focus on salient structural patterns, thereby improving the extraction and utilization of fine-grained image textures. When combined with the diffusion transformer backbone, this conditioning mechanism contributes to more accurate and robust tracer separation. To evaluate its effectiveness, the proposed MS-CDT is compared with several advanced methods on two types of 3D PET datasets: brain and chest scans. Experimental results indicate that MS-CDT achieved competitive performance in terms of image quality and preservation of clinically relevant information. Code is available at: https://github.com/yqx7150/MS-CDT.
IVJun 20, 2025Code
A Prior-Guided Joint Diffusion Model in Projection Domain for PET Tracer ConversionFang Chen, Weifeng Zhang, Xingyu Ai et al.
Positron emission tomography (PET) is widely used to assess metabolic activity, but its application is limited by the availability of radiotracers. 18F-labeled fluorodeoxyglucose (18F-FDG) is the most commonly used tracer but shows limited effectiveness for certain tumors. In contrast, 6-18F-fluoro-3,4-dihydroxy-L-phenylalanine (18F-DOPA) offers higher specificity for neuroendocrine tumors and neurological disorders. However, the complexity of its synthesis process and constraints on transportation time have limited its clinical application. Among different forms of raw data acquired by the scanner, sinogram is a commonly used representation in PET imaging. Therefore, modeling in projection domain enables more direct utilization of the original information, potentially reducing the accumulation errors during the image reconstruction process. Inspired by these factors, this study proposes a prior-guided joint diffusion model (PJDM) for transforming 18F-FDG PET sinograms into 18F-DOPA PET sinograms. During inference, an initial synthetic 18F-DOPA PET sinogram is first generated using a higher-order hybrid sampler. This sinogram is then degraded and serves as an additional condition to guide the iterative refinement process. Experimental results demonstrated that PJDM effectively improved both sinogram quality and the final synthetic outcomes. The code is available at: https://github.com/yqx7150/PJDM.
IVJun 16, 2025Code
PRO: Projection Domain Synthesis for CT ImagingKang Chen, Bin Huang, Xuebin Yang et al.
Synthetic CT projection data is crucial for advancing imaging research, yet its generation remains challenging. Current image domain methods are limited as they cannot simulate the physical acquisition process or utilize the complete statistical information present in projection data, restricting their utility and fidelity. In this work, we present PRO, a projection domain synthesis foundation model for CT imaging. To the best of our knowledge, this is the first study that performs CT synthesis in the projection domain. Unlike previous approaches that operate in the image domain, PRO learns rich structural representations from projection data and leverages anatomical text prompts for controllable synthesis. Projection data generation models can utilize complete measurement signals and simulate the physical processes of scanning, including material attenuation characteristics, beam hardening, scattering, and projection geometry, and support research on downstream imaging tasks. Moreover, PRO functions as a foundation model, capable of generalizing across diverse downstream tasks by adjusting its generative behavior via prompt inputs. Experimental results demonstrated that incorporating our synthesized data significantly improves performance across multiple downstream tasks, including low-dose and sparse-view reconstruction. These findings underscore the versatility and scalability of PRO in data generation for various CT applications. These results highlight the potential of projection domain synthesis as a powerful tool for data augmentation and robust CT imaging. Our source code is publicly available at: https://github.com/yqx7150/PRO.
CVSep 26, 2021Code
Self-Supervised Learning for MRI Reconstruction with a Parallel Network Training FrameworkChen Hu, Cheng Li, Haifeng Wang et al.
Image reconstruction from undersampled k-space data plays an important role in accelerating the acquisition of MR data, and a lot of deep learning-based methods have been exploited recently. Despite the achieved inspiring results, the optimization of these methods commonly relies on the fully-sampled reference data, which are time-consuming and difficult to collect. To address this issue, we propose a novel self-supervised learning method. Specifically, during model optimization, two subsets are constructed by randomly selecting part of k-space data from the undersampled data and then fed into two parallel reconstruction networks to perform information recovery. Two reconstruction losses are defined on all the scanned data points to enhance the network's capability of recovering the frequency information. Meanwhile, to constrain the learned unscanned data points of the network, a difference loss is designed to enforce consistency between the two parallel networks. In this way, the reconstruction model can be properly trained with only the undersampled data. During the model evaluation, the undersampled data are treated as the inputs and either of the two trained networks is expected to reconstruct the high-quality results. The proposed method is flexible and can be employed in any existing deep learning-based method. The effectiveness of the method is evaluated on an open brain MRI dataset. Experimental results demonstrate that the proposed self-supervised method can achieve competitive reconstruction performance compared to the corresponding supervised learning method at high acceleration rates (4 and 8). The code is publicly available at \url{https://github.com/chenhu96/Self-Supervised-MRI-Reconstruction}.
IVJul 16, 2019Code
X-Net: Brain Stroke Lesion Segmentation Based on Depthwise Separable Convolution and Long-range DependenciesKehan Qi, Hao Yang, Cheng Li et al.
The morbidity of brain stroke increased rapidly in the past few years. To help specialists in lesion measurements and treatment planning, automatic segmentation methods are critically required for clinical practices. Recently, approaches based on deep learning and methods for contextual information extraction have served in many image segmentation tasks. However, their performances are limited due to the insufficient training of a large number of parameters, which sometimes fail in capturing long-range dependencies. To address these issues, we propose a depthwise separable convolution based X-Net that designs a nonlocal operation namely Feature Similarity Module (FSM) to capture long-range dependencies. The adopted depthwise convolution allows to reduce the network size, while the developed FSM provides a more effective, dense contextual information extraction and thus facilitates better segmentation. The effectiveness of X-Net was evaluated on an open dataset Anatomical Tracings of Lesions After Stroke (ATLAS) with superior performance achieved compared to other six state-of-the-art approaches. We make our code and models available at https://github.com/Andrewsher/X-Net.
CVFeb 12
GR-Diffusion: 3D Gaussian Representation Meets Diffusion in Whole-Body PET ReconstructionMengxiao Geng, Zijie Chen, Ran Hong et al.
Positron emission tomography (PET) reconstruction is a critical challenge in molecular imaging, often hampered by noise amplification, structural blurring, and detail loss due to sparse sampling and the ill-posed nature of inverse problems. The three-dimensional discrete Gaussian representation (GR), which efficiently encodes 3D scenes using parameterized discrete Gaussian distributions, has shown promise in computer vision. In this work, we pro-pose a novel GR-Diffusion framework that synergistically integrates the geometric priors of GR with the generative power of diffusion models for 3D low-dose whole-body PET reconstruction. GR-Diffusion employs GR to generate a reference 3D PET image from projection data, establishing a physically grounded and structurally explicit benchmark that overcomes the low-pass limitations of conventional point-based or voxel-based methods. This reference image serves as a dual guide during the diffusion process, ensuring both global consistency and local accuracy. Specifically, we employ a hierarchical guidance mechanism based on the GR reference. Fine-grained guidance leverages differences to refine local details, while coarse-grained guidance uses multi-scale difference maps to correct deviations. This strategy allows the diffusion model to sequentially integrate the strong geometric prior from GR and recover sub-voxel information. Experimental results on the UDPET and Clinical datasets with varying dose levels show that GR-Diffusion outperforms state-of-the-art methods in enhancing 3D whole-body PET image quality and preserving physiological details.
CVApr 30
Residual Gaussian Splatting for Ultra Sparse-View CBCT ReconstructionJian Lin, Jiancheng Fang, Shaoyu Wang et al.
While 3D Gaussian splatting (3DGS) offers explicit and efficient scene representations for cone-beam computed tomography reconstruction, conventional photometric optimization inherently suffers from spectral bias under ultra sparse-view conditions, leading to over-smoothing and a loss of high-frequency anatomical details. Since wavelet transforms provide rich high-frequency information and have been widely utilized to enhance sparse reconstruction, this work integrates wavelet multi-resolution analysis with 3DGS. To circumvent the mathematical mismatch between the strict non-negativity of physical X-ray attenuation and the bipolar nature of high-frequency wavelet coefficients, we propose Residual Gaussian Splatting (RGS). Methodologically, we introduce a spectrally-decoupled Gaussian representation that stratifies the volumetric field into a geometric base component and a residual detail component. This decomposition systematically transforms explicit high-frequency fitting into a physically consistent, implicit residual compensation task. Furthermore, we devise a spectral-spatial collaborative optimization strategy to coordinate the interplay between geometric anchoring and texture refinement, effectively preventing spectral crosstalk. Extensive experiments on clinical datasets demonstrate that RGS enables the reconstructed images to capture highly refined geometric textures. It successfully resolves the trade-off between artifact suppression and detail preservation, yielding superior visual fidelity in complex trabecular and vascular structures compared to existing neural rendering baselines.
CVJan 3, 2024
Multimodal self-supervised learning for lesion localizationHao Yang, Hong-Yu Zhou, Cheng Li et al.
Multimodal deep learning utilizing imaging and diagnostic reports has made impressive progress in the field of medical imaging diagnostics, demonstrating a particularly strong capability for auxiliary diagnosis in cases where sufficient annotation information is lacking. Nonetheless, localizing diseases accurately without detailed positional annotations remains a challenge. Although existing methods have attempted to utilize local information to achieve fine-grained semantic alignment, their capability in extracting the fine-grained semantics of the comprehensive context within reports is limited. To address this problem, a new method is introduced that takes full sentences from textual reports as the basic units for local semantic alignment. This approach combines chest X-ray images with their corresponding textual reports, performing contrastive learning at both global and local levels. The leading results obtained by this method on multiple datasets confirm its efficacy in the task of lesion localization.
IVJan 17, 2025
Physics-informed DeepCT: Sinogram Wavelet Decomposition Meets Masked DiffusionZekun Zhou, Tan Liu, Bing Yu et al.
Diffusion model shows remarkable potential on sparse-view computed tomography (SVCT) reconstruction. However, when a network is trained on a limited sample space, its generalization capability may be constrained, which degrades performance on unfamiliar data. For image generation tasks, this can lead to issues such as blurry details and inconsistencies between regions. To alleviate this problem, we propose a Sinogram-based Wavelet random decomposition And Random mask diffusion Model (SWARM) for SVCT reconstruction. Specifically, introducing a random mask strategy in the sinogram effectively expands the limited training sample space. This enables the model to learn a broader range of data distributions, enhancing its understanding and generalization of data uncertainty. In addition, applying a random training strategy to the high-frequency components of the sinogram wavelet enhances feature representation and improves the ability to capture details in different frequency bands, thereby improving performance and robustness. Two-stage iterative reconstruction method is adopted to ensure the global consistency of the reconstructed image while refining its details. Experimental results demonstrate that SWARM outperforms competing approaches in both quantitative and qualitative performance across various datasets.
IVNov 6, 2024
Sub-DM:Subspace Diffusion Model with Orthogonal Decomposition for MRI ReconstructionYu Guan, Qinrong Cai, Wei Li et al.
Diffusion model-based approaches recently achieved re-markable success in MRI reconstruction, but integration into clinical routine remains challenging due to its time-consuming convergence. This phenomenon is partic-ularly notable when directly apply conventional diffusion process to k-space data without considering the inherent properties of k-space sampling, limiting k-space learning efficiency and image reconstruction quality. To tackle these challenges, we introduce subspace diffusion model with orthogonal decomposition, a method (referred to as Sub-DM) that restrict the diffusion process via projections onto subspace as the k-space data distribution evolves toward noise. Particularly, the subspace diffusion model circumvents the inference challenges posed by the com-plex and high-dimensional characteristics of k-space data, so the highly compact subspace ensures that diffusion process requires only a few simple iterations to produce accurate prior information. Furthermore, the orthogonal decomposition strategy based on wavelet transform hin-ders the information loss during the migration of the vanilla diffusion process to the subspace. Considering the strate-gy is approximately reversible, such that the entire pro-cess can be reversed. As a result, it allows the diffusion processes in different spaces to refine models through a mutual feedback mechanism, enabling the learning of ac-curate prior even when dealing with complex k-space data. Comprehensive experiments on different datasets clearly demonstrate that the superiority of Sub-DM against state of-the-art methods in terms of reconstruction speed and quality.
IVOct 30, 2024
st-DTPM: Spatial-Temporal Guided Diffusion Transformer Probabilistic Model for Delayed Scan PET Image PredictionRan Hong, Yuxia Huang, Lei Liu et al.
PET imaging is widely employed for observing biological metabolic activities within the human body. However, numerous benign conditions can cause increased uptake of radiopharmaceuticals, confounding differentiation from malignant tumors. Several studies have indicated that dual-time PET imaging holds promise in distinguishing between malignant and benign tumor processes. Nevertheless, the hour-long distribution period of radiopharmaceuticals post-injection complicates the determination of optimal timing for the second scan, presenting challenges in both practical applications and research. Notably, we have identified that delay time PET imaging can be framed as an image-to-image conversion problem. Motivated by this insight, we propose a novel spatial-temporal guided diffusion transformer probabilistic model (st-DTPM) to solve dual-time PET imaging prediction problem. Specifically, this architecture leverages the U-net framework that integrates patch-wise features of CNN and pixel-wise relevance of Transformer to obtain local and global information. And then employs a conditional DDPM model for image synthesis. Furthermore, on spatial condition, we concatenate early scan PET images and noisy PET images on every denoising step to guide the spatial distribution of denoising sampling. On temporal condition, we convert diffusion time steps and delay time to a universal time vector, then embed it to each layer of model architecture to further improve the accuracy of predictions. Experimental results demonstrated the superiority of our method over alternative approaches in preserving image quality and structural information, thereby affirming its efficacy in predictive task.
CVJan 4, 2024
A multi-modal vision-language model for generalizable annotation-free pathology localizationHao Yang, Hong-Yu Zhou, Jiarun Liu et al.
Existing deep learning models for defining pathology from clinical imaging data rely on expert annotations and lack generalization capabilities in open clinical environments. Here, we present a generalizable vision-language model for Annotation-Free pathology Localization (AFLoc). The core strength of AFLoc is extensive multi-level semantic structure-based contrastive learning, which comprehensively aligns multi-granularity medical concepts with abundant image features to adapt to the diverse expressions of pathologies without the reliance on expert image annotations. We conduct primary experiments on a dataset of 220K pairs of image-report chest X-ray images and perform validation across eight external datasets encompassing 34 types of chest pathologies. The results demonstrate that AFLoc outperforms state-of-the-art methods in both annotation-free localization and classification tasks. Additionally, we assess the generalizability of AFLoc on other modalities, including histopathology and retinal fundus images. We show that AFLoc exhibits robust generalization capabilities, even surpassing human benchmarks in localizing five different types of pathological images. These results highlight the potential of AFLoc in reducing annotation requirements and its applicability in complex clinical environments.
CVJan 3, 2024
AID-DTI: Accelerating High-fidelity Diffusion Tensor Imaging with Detail-Preserving Model-based Deep LearningWenxin Fan, Jian Cheng, Cheng Li et al.
Deep learning has shown great potential in accelerating diffusion tensor imaging (DTI). Nevertheless, existing methods tend to suffer from Rician noise and detail loss in reconstructing the DTI-derived parametric maps especially when sparsely sampled q-space data are used. This paper proposes a novel method, AID-DTI (Accelerating hIgh fiDelity Diffusion Tensor Imaging), to facilitate fast and accurate DTI with only six measurements. AID-DTI is equipped with a newly designed Singular Value Decomposition (SVD)-based regularizer, which can effectively capture fine details while suppressing noise during network training. Experimental results on Human Connectome Project (HCP) data consistently demonstrate that the proposed method estimates DTI parameter maps with fine-grained details and outperforms three state-of-the-art methods both quantitatively and qualitatively.
CVNov 18, 2025
Iterative Diffusion-Refined Neural Attenuation Fields for Multi-Source Stationary CT Reconstruction: NAF Meets Diffusion ModelJiancheng Fang, Shaoyu Wang, Junlin Wang et al.
Multi-source stationary computed tomography (CT) has recently attracted attention for its ability to achieve rapid image reconstruction, making it suitable for time-sensitive clinical and industrial applications. However, practical systems are often constrained by ultra-sparse-view sampling, which significantly degrades reconstruction quality. Traditional methods struggle under ultra-sparse-view settings, where interpolation becomes inaccurate and the resulting reconstructions are unsatisfactory. To address this challenge, this study proposes Diffusion-Refined Neural Attenuation Fields (Diff-NAF), an iterative framework tailored for multi-source stationary CT under ultra-sparse-view conditions. Diff-NAF combines a Neural Attenuation Field representation with a dual-branch conditional diffusion model. The process begins by training an initial NAF using ultra-sparse-view projections. New projections are then generated through an Angle-Prior Guided Projection Synthesis strategy that exploits inter view priors, and are subsequently refined by a Diffusion-driven Reuse Projection Refinement Module. The refined projections are incorporated as pseudo-labels into the training set for the next iteration. Through iterative refinement, Diff-NAF progressively enhances projection completeness and reconstruction fidelity under ultra-sparse-view conditions, ultimately yielding high-quality CT reconstructions. Experimental results on multiple simulated 3D CT volumes and real projection data demonstrate that Diff-NAF achieves the best performance under ultra-sparse-view conditions.
CVSep 28, 2025
Tunable-Generalization Diffusion Powered by Self-Supervised Contextual Sub-Data for Low-Dose CT ReconstructionGuoquan Wei, Liu Shi, Zekun Zhou et al.
Current models based on deep learning for low-dose CT denoising rely heavily on paired data and generalize poorly. Even the more concerned diffusion models need to learn the distribution of clean data for reconstruction, which is difficult to satisfy in medical clinical applications. At the same time, self-supervised-based methods face the challenge of significant degradation of generalizability of models pre-trained for the current dose to expand to other doses. To address these issues, this work proposes a novel method of TUnable-geneRalizatioN Diffusion (TurnDiff) powered by self-supervised contextual sub-data for low-dose CT reconstruction. Firstly, a contextual subdata self-enhancing similarity strategy is designed for denoising centered on the LDCT projection domain, which provides an initial prior for the subsequent progress. Subsequently, the initial prior is used to combine knowledge distillation with a deep combination of latent diffusion models for optimizing image details. The pre-trained model is used for inference reconstruction, and the pixel-level self-correcting fusion technique is proposed for fine-grained reconstruction of the image domain to enhance the image fidelity, using the initial prior and the LDCT image as a guide. In addition, the technique is flexibly applied to the generalization of upper and lower doses or even unseen doses. Dual-domain strategy cascade for self-supervised LDCT denoising, TurnDiff requires only LDCT projection domain data for training and testing. Comprehensive evaluation on both benchmark datasets and real-world data demonstrates that TurnDiff consistently outperforms state-of-the-art methods in both reconstruction and generalization.
CVSep 11, 2025
ALL-PET: A Low-resource and Low-shot PET Foundation Model in Projection DomainBin Huang, Kang Chen, Bingxuan Li et al.
Building large-scale foundation model for PET imaging is hindered by limited access to labeled data and insufficient computational resources. To overcome data scarcity and efficiency limitations, we propose ALL-PET, a low-resource, low-shot PET foundation model operating directly in projection domain. ALL-PET leverages a latent diffusion model (LDM) with three key innovations. First, we design a Radon mask augmentation strategy (RMAS) that generates over 200,000 structurally diverse training samples by projecting randomized image-domain masks into sinogram space, significantly improving generalization with minimal data. This is extended by a dynamic multi-mask (DMM) mechanism that varies mask quantity and distribution, enhancing data diversity without added model complexity. Second, we implement positive/negative mask constraints to embed strict geometric consistency, reducing parameter burden while preserving generation quality. Third, we introduce transparent medical attention (TMA), a parameter-free, geometry-driven mechanism that enhances lesion-related regions in raw projection data. Lesion-focused attention maps are derived from coarse segmentation, covering both hypermetabolic and hypometabolic areas, and projected into sinogram space for physically consistent guidance. The system supports clinician-defined ROI adjustments, ensuring flexible, interpretable, and task-adaptive emphasis aligned with PET acquisition physics. Experimental results show that ALL-PET achieves high-quality sinogram generation using only 500 samples, with performance comparable to models trained on larger datasets. ALL-PET generalizes across tasks including low-dose reconstruction, attenuation correction, delayed-frame prediction, and tracer separation, operating efficiently with memory use under 24GB.
CVSep 7, 2025
Physics-Guided Null-Space Diffusion with Sparse Masking for Corrective Sparse-View CT ReconstructionZekun Zhou, Yanru Gong, Liu Shi et al.
Diffusion models have demonstrated remarkable generative capabilities in image processing tasks. We propose a Sparse condition Temporal Rewighted Integrated Distribution Estimation guided diffusion model (STRIDE) for sparse-view CT reconstruction. Specifically, we design a joint training mechanism guided by sparse conditional probabilities to facilitate the model effective learning of missing projection view completion and global information modeling. Based on systematic theoretical analysis, we propose a temporally varying sparse condition reweighting guidance strategy to dynamically adjusts weights during the progressive denoising process from pure noise to the real image, enabling the model to progressively perceive sparse-view information. The linear regression is employed to correct distributional shifts between known and generated data, mitigating inconsistencies arising during the guidance process. Furthermore, we construct a dual-network parallel architecture to perform global correction and optimization across multiple sub-frequency components, thereby effectively improving the model capability in both detail restoration and structural preservation, ultimately achieving high-quality image reconstruction. Experimental results on both public and real datasets demonstrate that the proposed method achieves the best improvement of 2.58 dB in PSNR, increase of 2.37\% in SSIM, and reduction of 0.236 in MSE compared to the best-performing baseline methods. The reconstructed images exhibit excellent generalization and robustness in terms of structural consistency, detail restoration, and artifact suppression.
CVAug 30, 2025
Double-Constraint Diffusion Model with Nuclear Regularization for Ultra-low-dose PET ReconstructionMengxiao Geng, Ran Hong, Bingxuan Li et al.
Ultra-low-dose positron emission tomography (PET) reconstruction holds significant potential for reducing patient radiation exposure and shortening examination times. However, it may also lead to increased noise and reduced imaging detail, which could decrease the image quality. In this study, we present a Double-Constraint Diffusion Model (DCDM), which freezes the weights of a pre-trained diffusion model and injects a trainable double-constraint controller into the encoding architecture, greatly reducing the number of trainable parameters for ultra-low-dose PET reconstruction. Unlike full fine-tuning models, DCDM can adapt to different dose levels without retraining all model parameters, thereby improving reconstruction flexibility. Specifically, the two constraint modules, named the Nuclear Transformer Constraint (NTC) and the Encoding Nexus Constraint (ENC), serve to refine the pre-trained diffusion model. The NTC leverages the nuclear norm as an approximation for matrix rank minimization, integrates the low-rank property into the Transformer architecture, and enables efficient information extraction from low-dose images and conversion into compressed feature representations in the latent space. Subsequently, the ENC utilizes these compressed feature representations to encode and control the pre-trained diffusion model, ultimately obtaining reconstructed PET images in the pixel space. In clinical reconstruction, the compressed feature representations from NTC help select the most suitable ENC for efficient unknown low-dose PET reconstruction. Experiments conducted on the UDPET public dataset and the Clinical dataset demonstrated that DCDM outperforms state-of-the-art methods on known dose reduction factors (DRF) and generalizes well to unknown DRF scenarios, proving valuable even at ultra-low dose levels, such as 1% of the full dose.
MED-PHJul 9, 2025
Diffusion-Assisted Frequency Attention Model for Whole-body Low-field MRI ReconstructionXin Xie, Yu Guan, Zhuoxu Cui et al.
By integrating the generative strengths of diffusion models with the representation capabilities of frequency-domain attention, DFAM effectively enhances reconstruction performance under low-SNR condi-tions. Experimental results demonstrate that DFAM consistently outperforms both conventional reconstruction algorithms and recent learning-based approaches. These findings highlight the potential of DFAM as a promising solution to advance low-field MRI reconstruction, particularly in resource-constrained or underdeveloped clinical settings.
IVJun 30, 2025
PWD: Prior-Guided and Wavelet-Enhanced Diffusion Model for Limited-Angle CTYi Liu, Yiyang Wen, Zekun Zhou et al.
Generative diffusion models have received increasing attention in medical imaging, particularly in limited-angle computed tomography (LACT). Standard diffusion models achieve high-quality image reconstruction but require a large number of sampling steps during inference, resulting in substantial computational overhead. Although skip-sampling strategies have been proposed to improve efficiency, they often lead to loss of fine structural details. To address this issue, we propose a prior information embedding and wavelet feature fusion fast sampling diffusion model for LACT reconstruction. The PWD enables efficient sampling while preserving reconstruction fidelity in LACT, and effectively mitigates the degradation typically introduced by skip-sampling. Specifically, during the training phase, PWD maps the distribution of LACT images to that of fully sampled target images, enabling the model to learn structural correspondences between them. During inference, the LACT image serves as an explicit prior to guide the sampling trajectory, allowing for high-quality reconstruction with significantly fewer steps. In addition, PWD performs multi-scale feature fusion in the wavelet domain, effectively enhancing the reconstruction of fine details by leveraging both low-frequency and high-frequency information. Quantitative and qualitative evaluations on clinical dental arch CBCT and periapical datasets demonstrate that PWD outperforms existing methods under the same sampling condition. Using only 50 sampling steps, PWD achieves at least 1.7 dB improvement in PSNR and 10% gain in SSIM.
IVJun 30, 2025
FD-DiT: Frequency Domain-Directed Diffusion Transformer for Low-Dose CT ReconstructionQiqing Liu, Guoquan Wei, Zekun Zhou et al.
Low-dose computed tomography (LDCT) reduces radiation exposure but suffers from image artifacts and loss of detail due to quantum and electronic noise, potentially impacting diagnostic accuracy. Transformer combined with diffusion models has been a promising approach for image generation. Nevertheless, existing methods exhibit limitations in preserving finegrained image details. To address this issue, frequency domain-directed diffusion transformer (FD-DiT) is proposed for LDCT reconstruction. FD-DiT centers on a diffusion strategy that progressively introduces noise until the distribution statistically aligns with that of LDCT data, followed by denoising processing. Furthermore, we employ a frequency decoupling technique to concentrate noise primarily in high-frequency domain, thereby facilitating effective capture of essential anatomical structures and fine details. A hybrid denoising network is then utilized to optimize the overall data reconstruction process. To enhance the capability in recognizing high-frequency noise, we incorporate sliding sparse local attention to leverage the sparsity and locality of shallow-layer information, propagating them via skip connections for improving feature representation. Finally, we propose a learnable dynamic fusion strategy for optimal component integration. Experimental results demonstrate that at identical dose levels, LDCT images reconstructed by FD-DiT exhibit superior noise and artifact suppression compared to state-of-the-art methods.
IVJun 23, 2025
Adaptive Mask-guided K-space Diffusion for Accelerated MRI ReconstructionQinrong Cai, Yu Guan, Zhibo Chen et al.
As the deep learning revolution marches on, masked modeling has emerged as a distinctive approach that involves predicting parts of the original data that are proportionally masked during training, and has demonstrated exceptional performance in multiple fields. Magnetic Resonance Imaging (MRI) reconstruction is a critical task in medical imaging that seeks to recover high-quality images from under-sampled k-space data. However, previous MRI reconstruction strategies usually optimized the entire image domain or k-space, without considering the importance of different frequency regions in the k-space This work introduces a diffusion model based on adaptive masks (AMDM), which utilizes the adaptive adjustment of frequency distribution based on k-space data to develop a hybrid masks mechanism that adapts to different k-space inputs. This enables the effective separation of high-frequency and low-frequency components, producing diverse frequency-specific representations. Additionally, the k-space frequency distribution informs the generation of adaptive masks, which, in turn, guide a closed-loop diffusion process. Experimental results verified the ability of this method to learn specific frequency information and thereby improved the quality of MRI reconstruction, providing a flexible framework for optimizing k-space data using masks in the future.
IVMay 15, 2025
Ordered-subsets Multi-diffusion Model for Sparse-view CT ReconstructionPengfei Yu, Bin Huang, Minghui Zhang et al.
Score-based diffusion models have shown significant promise in the field of sparse-view CT reconstruction. However, the projection dataset is large and riddled with redundancy. Consequently, applying the diffusion model to unprocessed data results in lower learning effectiveness and higher learning difficulty, frequently leading to reconstructed images that lack fine details. To address these issues, we propose the ordered-subsets multi-diffusion model (OSMM) for sparse-view CT reconstruction. The OSMM innovatively divides the CT projection data into equal subsets and employs multi-subsets diffusion model (MSDM) to learn from each subset independently. This targeted learning approach reduces complexity and enhances the reconstruction of fine details. Furthermore, the integration of one-whole diffusion model (OWDM) with complete sinogram data acts as a global information constraint, which can reduce the possibility of generating erroneous or inconsistent sinogram information. Moreover, the OSMM's unsupervised learning framework provides strong robustness and generalizability, adapting seamlessly to varying sparsity levels of CT sinograms. This ensures consistent and reliable performance across different clinical scenarios. Experimental results demonstrate that OSMM outperforms traditional diffusion models in terms of image quality and noise resilience, offering a powerful and versatile solution for advanced CT imaging in sparse-view scenarios.
CVApr 24, 2025
ESDiff: Encoding Strategy-inspired Diffusion Model with Few-shot Learning for Color Image InpaintingJunyan Zhang, Yan Li, Mengxiao Geng et al.
Image inpainting is a technique used to restore missing or damaged regions of an image. Traditional methods primarily utilize information from adjacent pixels for reconstructing missing areas, while they struggle to preserve complex details and structures. Simultaneously, models based on deep learning necessitate substantial amounts of training data. To address this challenge, an encoding strategy-inspired diffusion model with few-shot learning for color image inpainting is proposed in this paper. The main idea of this novel encoding strategy is the deployment of a "virtual mask" to construct high-dimensional objects through mutual perturbations between channels. This approach enables the diffusion model to capture diverse image representations and detailed features from limited training samples. Moreover, the encoding strategy leverages redundancy between channels, integrates with low-rank methods during iterative inpainting, and incorporates the diffusion model to achieve accurate information output. Experimental results indicate that our method exceeds current techniques in quantitative metrics, and the reconstructed images quality has been improved in aspects of texture and structural integrity, leading to more precise and coherent results.