IVMay 25, 2022
Online Deep Equilibrium Learning for Regularization by DenoisingJiaming Liu, Xiaojian Xu, Weijie Gan et al.
Plug-and-Play Priors (PnP) and Regularization by Denoising (RED) are widely-used frameworks for solving imaging inverse problems by computing fixed-points of operators combining physical measurement models and learned image priors. While traditional PnP/RED formulations have focused on priors specified using image denoisers, there is a growing interest in learning PnP/RED priors that are end-to-end optimal. The recent Deep Equilibrium Models (DEQ) framework has enabled memory-efficient end-to-end learning of PnP/RED priors by implicitly differentiating through the fixed-point equations without storing intermediate activation values. However, the dependence of the computational/memory complexity of the measurement models in PnP/RED on the total number of measurements leaves DEQ impractical for many imaging applications. We propose ODER as a new strategy for improving the efficiency of DEQ through stochastic approximations of the measurement models. We theoretically analyze ODER giving insights into its convergence and ability to approximate the traditional DEQ approach. Our numerical results suggest the potential improvements in training/testing complexity due to ODER on three distinct imaging applications.
IVOct 12, 2022
CoRRECT: A Deep Unfolding Framework for Motion-Corrected Quantitative R2* MappingXiaojian Xu, Weijie Gan, Satya V. V. N. Kothapalli et al.
Quantitative MRI (qMRI) refers to a class of MRI methods for quantifying the spatial distribution of biological tissue parameters. Traditional qMRI methods usually deal separately with artifacts arising from accelerated data acquisition, involuntary physical motion, and magnetic-field inhomogeneities, leading to suboptimal end-to-end performance. This paper presents CoRRECT, a unified deep unfolding (DU) framework for qMRI consisting of a model-based end-to-end neural network, a method for motion-artifact reduction, and a self-supervised learning scheme. The network is trained to produce R2* maps whose k-space data matches the real data by also accounting for motion and field inhomogeneities. When deployed, CoRRECT only uses the k-space data without any pre-computed parameters for motion or inhomogeneity correction. Our results on experimentally collected multi-Gradient-Recalled Echo (mGRE) MRI data show that CoRRECT recovers motion and inhomogeneity artifact-free R2* maps in highly accelerated acquisition settings. This work opens the door to DU methods that can integrate physical measurement models, biophysical signal models, and learned prior models for high-quality qMRI.
CLFeb 4
ERNIE 5.0 Technical ReportHaifeng Wang, Hua Wu, Tian Wu et al.
In this report, we introduce ERNIE 5.0, a natively autoregressive foundation model desinged for unified multimodal understanding and generation across text, image, video, and audio. All modalities are trained from scratch under a unified next-group-of-tokens prediction objective, based on an ultra-sparse mixture-of-experts (MoE) architecture with modality-agnostic expert routing. To address practical challenges in large-scale deployment under diverse resource constraints, ERNIE 5.0 adopts a novel elastic training paradigm. Within a single pre-training run, the model learns a family of sub-models with varying depths, expert capacities, and routing sparsity, enabling flexible trade-offs among performance, model size, and inference latency in memory- or time-constrained scenarios. Moreover, we systematically address the challenges of scaling reinforcement learning to unified foundation models, thereby guaranteeing efficient and stable post-training under ultra-sparse MoE architectures and diverse multimodal settings. Extensive experiments demonstrate that ERNIE 5.0 achieves strong and balanced performance across multiple modalities. To the best of our knowledge, among publicly disclosed models, ERNIE 5.0 represents the first production-scale realization of a trillion-parameter unified autoregressive model that supports both multimodal understanding and generation. To facilitate further research, we present detailed visualizations of modality-agnostic expert routing in the unified model, alongside comprehensive empirical analysis of elastic training, aiming to offer profound insights to the community.
LGAug 2, 2024
Reconstructing Richtmyer-Meshkov instabilities from noisy radiographs using low dimensional features and attention-based neural networksDaniel A. Serino, Marc L. Klasky, Balasubramanya T. Nadiga et al.
A trained attention-based transformer network can robustly recover the complex topologies given by the Richtmyer-Meshkoff instability from a sequence of hydrodynamic features derived from radiographic images corrupted with blur, scatter, and noise. This approach is demonstrated on ICF-like double shell hydrodynamic simulations. The key component of this network is a transformer encoder that acts on a sequence of features extracted from noisy radiographs. This encoder includes numerous self-attention layers that act to learn temporal dependencies in the input sequences and increase the expressiveness of the model. This approach is demonstrated to exhibit an excellent ability to accurately recover the Richtmyer-Meshkov instability growth rates, even despite the gas-metal interface being greatly obscured by radiographic noise.
IVSep 29, 2024
Swap-Net: A Memory-Efficient 2.5D Network for Sparse-View 3D Cone Beam CT ReconstructionXiaojian Xu, Marc Klasky, Michael T. McCann et al.
Reconstructing 3D cone beam computed tomography (CBCT) images from a limited set of projections is an important inverse problem in many imaging applications from medicine to inertial confinement fusion (ICF). The performance of traditional methods such as filtered back projection (FBP) and model-based regularization is sub-optimal when the number of available projections is limited. In the past decade, deep learning (DL) has gained great popularity for solving CT inverse problems. A typical DL-based method for CBCT image reconstruction is to learn an end-to-end mapping by training a 2D or 3D network. However, 2D networks fail to fully use global information. While 3D networks are desirable, they become impractical as image sizes increase because of the high memory cost. This paper proposes Swap-Net, a memory-efficient 2.5D network for sparse-view 3D CBCT image reconstruction. Swap-Net uses a sequence of novel axes-swapping operations to produce 3D volume reconstruction in an end-to-end fashion without using full 3D convolutions. Simulation results show that Swap-Net consistently outperforms baseline methods both quantitatively and qualitatively in terms of reducing artifacts and preserving details of complex hydrodynamic simulations of relevance to the ICF community.
CVSep 25, 2025
Decipher-MR: A Vision-Language Foundation Model for 3D MRI RepresentationsZhijian Yang, Noel DSouza, Istvan Megyeri et al.
Magnetic Resonance Imaging (MRI) is a critical medical imaging modality in clinical diagnosis and research, yet its complexity and heterogeneity pose challenges for automated analysis, particularly in scalable and generalizable machine learning applications. While foundation models have revolutionized natural language and vision tasks, their application to MRI remains limited due to data scarcity and narrow anatomical focus. In this work, we present Decipher-MR, a 3D MRI-specific vision-language foundation model trained on a large-scale dataset comprising 200,000 MRI series from over 22,000 studies spanning diverse anatomical regions, sequences, and pathologies. Decipher-MR integrates self-supervised vision learning with report-guided text supervision to build robust, generalizable representations, enabling effective adaptation across broad applications. To enable robust and diverse clinical tasks with minimal computational overhead, Decipher-MR supports a modular design that enables tuning of lightweight, task-specific decoders attached to a frozen pretrained encoder. Following this setting, we evaluate Decipher-MR across diverse benchmarks including disease classification, demographic prediction, anatomical localization, and cross-modal retrieval, demonstrating consistent performance gains over existing foundation models and task-specific approaches. Our results establish Decipher-MR as a scalable and versatile foundation for MRI-based AI, facilitating efficient development across clinical and research domains.
CVSep 15, 2025
Multi Anatomy X-Ray Foundation ModelNishank Singla, Krisztian Koos, Farzin Haddadpour et al.
X-ray imaging is a ubiquitous in radiology, yet most existing AI foundation models are limited to chest anatomy and fail to generalize across broader clinical tasks. In this work, we introduce XR-0, the multi-anatomy X-ray foundation model using self-supervised learning on a large, private dataset of 1.15 million images spanning diverse anatomical regions and evaluated across 12 datasets and 20 downstream tasks, including classification, retrieval, segmentation, localization, visual grounding, and report generation. XR-0 achieves state-of-the-art performance on most multi-anatomy tasks and remains competitive on chest-specific benchmarks. Our results demonstrate that anatomical diversity and supervision are critical for building robust, general-purpose medical vision models, paving the way for scalable and adaptable AI systems in radiology.
CVJun 4, 2024
Learning Image Priors through Patch-based Diffusion Models for Solving Inverse ProblemsJason Hu, Bowen Song, Xiaojian Xu et al.
Diffusion models can learn strong image priors from underlying data distribution and use them to solve inverse problems, but the training process is computationally expensive and requires lots of data. Such bottlenecks prevent most existing works from being feasible for high-dimensional and high-resolution data such as 3D images. This paper proposes a method to learn an efficient data prior for the entire image by training diffusion models only on patches of images. Specifically, we propose a patch-based position-aware diffusion inverse solver, called PaDIS, where we obtain the score function of the whole image through scores of patches and their positional encoding and utilize this as the prior for solving inverse problems. First of all, we show that this diffusion model achieves an improved memory efficiency and data efficiency while still maintaining the capability to generate entire images via positional encoding. Additionally, the proposed PaDIS model is highly flexible and can be plugged in with different diffusion inverse solvers (DIS). We demonstrate that the proposed PaDIS approach enables solving various inverse problems in both natural and medical image domains, including CT reconstruction, deblurring, and superresolution, given only patch-based priors. Notably, PaDIS outperforms previous DIS methods trained on entire image priors in the case of limited training data, demonstrating the data efficiency of our proposed approach by learning patch-based prior.
SPMay 12, 2023
Poisson-Gaussian Holographic Phase Retrieval with Score-based Image PriorZongyu Li, Jason Hu, Xiaojian Xu et al.
Phase retrieval (PR) is a crucial problem in many imaging applications. This study focuses on resolving the holographic phase retrieval problem in situations where the measurements are affected by a combination of Poisson and Gaussian noise, which commonly occurs in optical imaging systems. To address this problem, we propose a new algorithm called "AWFS" that uses the accelerated Wirtinger flow (AWF) with a score function as generative prior. Specifically, we formulate the PR problem as an optimization problem that incorporates both data fidelity and regularization terms. We calculate the gradient of the log-likelihood function for PR and determine its corresponding Lipschitz constant. Additionally, we introduce a generative prior in our regularization framework by using score matching to capture information about the gradient of image prior distributions. We provide theoretical analysis that establishes a critical-point convergence guarantee for the proposed algorithm. The results of our simulation experiments on three different datasets show the following: 1) By using the PG likelihood model, the proposed algorithm improves reconstruction compared to algorithms based solely on Gaussian or Poisson likelihood. 2) The proposed score-based image prior method, performs better than the method based on denoising diffusion probabilistic model (DDPM), as well as plug-and-play alternating direction method of multipliers (PnP-ADMM) and regularization by denoising (RED).
IVFeb 10, 2022
Monotonically Convergent Regularization by DenoisingYuyang Hu, Jiaming Liu, Xiaojian Xu et al.
Regularization by denoising (RED) is a widely-used framework for solving inverse problems by leveraging image denoisers as image priors. Recent work has reported the state-of-the-art performance of RED in a number of imaging applications using pre-trained deep neural nets as denoisers. Despite the recent progress, the stable convergence of RED algorithms remains an open problem. The existing RED theory only guarantees stability for convex data-fidelity terms and nonexpansive denoisers. This work addresses this issue by developing a new monotone RED (MRED) algorithm, whose convergence does not require nonexpansiveness of the deep denoising prior. Simulations on image deblurring and compressive sensing recovery from random matrices show the stability of MRED even when the traditional RED algorithm diverges.
IVFeb 4, 2022
Bregman Plug-and-Play PriorsAbdullah H. Al-Shabili, Xiaojian Xu, Ivan Selesnick et al.
The past few years have seen a surge of activity around integration of deep learning networks and optimization algorithms for solving inverse problems. Recent work on plug-and-play priors (PnP), regularization by denoising (RED), and deep unfolding has shown the state-of-the-art performance of such integration in a variety of applications. However, the current paradigm for designing such algorithms is inherently Euclidean, due to the usage of the quadratic norm within the projection and proximal operators. We propose to broaden this perspective by considering a non-Euclidean setting based on the more general Bregman distance. Our new Bregman Proximal Gradient Method variant of PnP (PnP-BPGM) and Bregman Steepest Descent variant of RED (RED-BSD) replace the traditional updates in PnP and RED from the quadratic norms to more general Bregman distance. We present a theoretical convergence result for PnP-BPGM and demonstrate the effectiveness of our algorithms on Poisson linear inverse problems.
IVJan 22, 2021
SGD-Net: Efficient Model-Based Deep Learning with Theoretical GuaranteesJiaming Liu, Yu Sun, Weijie Gan et al.
Deep unfolding networks have recently gained popularity in the context of solving imaging inverse problems. However, the computational and memory complexity of data-consistency layers within traditional deep unfolding networks scales with the number of measurements, limiting their applicability to large-scale imaging inverse problems. We propose SGD-Net as a new methodology for improving the efficiency of deep unfolding through stochastic approximations of the data-consistency layers. Our theoretical analysis shows that SGD-Net can be trained to approximate batch deep unfolding networks to an arbitrary precision. Our numerical results on intensity diffraction tomography and sparse-view computed tomography show that SGD-Net can match the performance of the batch network at a fraction of training and testing complexity.
LGJun 5, 2020
Scalable Plug-and-Play ADMM with Convergence GuaranteesYu Sun, Zihui Wu, Xiaojian Xu et al.
Plug-and-play priors (PnP) is a broadly applicable methodology for solving inverse problems by exploiting statistical priors specified as denoisers. Recent work has reported the state-of-the-art performance of PnP algorithms using pre-trained deep neural nets as denoisers in a number of imaging applications. However, current PnP algorithms are impractical in large-scale settings due to their heavy computational and memory requirements. This work addresses this issue by proposing an incremental variant of the widely used PnP-ADMM algorithm, making it scalable to large-scale datasets. We theoretically analyze the convergence of the algorithm under a set of explicit assumptions, extending recent theoretical results in the area. Additionally, we show the effectiveness of our algorithm with nonsmooth data-fidelity terms and deep neural net priors, its fast convergence compared to existing PnP algorithms, and its scalability in terms of speed and memory.
CVOct 30, 2018
Image Restoration using Total Variation Regularized Deep Image PriorJiaming Liu, Yu Sun, Xiaojian Xu et al.
In the past decade, sparsity-driven regularization has led to significant improvements in image reconstruction. Traditional regularizers, such as total variation (TV), rely on analytical models of sparsity. However, increasingly the field is moving towards trainable models, inspired from deep learning. Deep image prior (DIP) is a recent regularization framework that uses a convolutional neural network (CNN) architecture without data-driven training. This paper extends the DIP framework by combining it with the traditional TV regularization. We show that the inclusion of TV leads to considerable performance gains when tested on several traditional restoration tasks such as image denoising and deblurring.
OCJul 20, 2018
signProx: One-Bit Proximal Algorithm for Nonconvex Stochastic OptimizationXiaojian Xu, Ulugbek S. Kamilov
Stochastic gradient descent (SGD) is one of the most widely used optimization methods for parallel and distributed processing of large datasets. One of the key limitations of distributed SGD is the need to regularly communicate the gradients between different computation nodes. To reduce this communication bottleneck, recent work has considered a one-bit variant of SGD, where only the sign of each gradient element is used in optimization. In this paper, we extend this idea by proposing a stochastic variant of the proximal-gradient method that also uses one-bit per update element. We prove the theoretical convergence of the method for non-convex optimization under a set of explicit assumptions. Our results indicate that the compressed method can match the convergence rate of the uncompressed one, making the proposed method potentially appealing for distributed processing of large datasets.