Ruogu Fang

CV
h-index58
33papers
1,229citations
Novelty50%
AI Score59

33 Papers

IVSep 13, 2022Code
DOMINO: Domain-aware Model Calibration in Medical Image Segmentation

Skylar E. Stolte, Kyle Volle, Aprinda Indahlastari et al.

Model calibration measures the agreement between the predicted probability estimates and the true correctness likelihood. Proper model calibration is vital for high-risk applications. Unfortunately, modern deep neural networks are poorly calibrated, compromising trustworthiness and reliability. Medical image segmentation particularly suffers from this due to the natural uncertainty of tissue boundaries. This is exasperated by their loss functions, which favor overconfidence in the majority classes. We address these challenges with DOMINO, a domain-aware model calibration method that leverages the semantic confusability and hierarchical similarity between class labels. Our experiments demonstrate that our DOMINO-calibrated deep neural networks outperform non-calibrated models and state-of-the-art morphometric methods in head image segmentation. Our results show that our method can consistently achieve better calibration, higher accuracy, and faster inference times than these methods, especially on rarer classes. This performance is attributed to our domain-aware regularization to inform semantic model calibration. These findings show the importance of semantic ties between class labels in building confidence in deep learning models. The framework has the potential to improve the trustworthiness and reliability of generic medical image segmentation models. The code for this article is available at: https://github.com/lab-smile/DOMINO.

CVFeb 10, 2023Code
DOMINO: Domain-aware Loss for Deep Learning Calibration

Skylar E. Stolte, Kyle Volle, Aprinda Indahlastari et al.

Deep learning has achieved the state-of-the-art performance across medical imaging tasks; however, model calibration is often not considered. Uncalibrated models are potentially dangerous in high-risk applications since the user does not know when they will fail. Therefore, this paper proposes a novel domain-aware loss function to calibrate deep learning models. The proposed loss function applies a class-wise penalty based on the similarity between classes within a given target domain. Thus, the approach improves the calibration while also ensuring that the model makes less risky errors even when incorrect. The code for this software is available at https://github.com/lab-smile/DOMINO.

LGDec 5, 2022
Distributed Pruning Towards Tiny Neural Networks in Federated Learning

Hong Huang, Lan Zhang, Chaoyue Sun et al.

Neural network pruning is an essential technique for reducing the size and complexity of deep neural networks, enabling large-scale models on devices with limited resources. However, existing pruning approaches heavily rely on training data for guiding the pruning strategies, making them ineffective for federated learning over distributed and confidential datasets. Additionally, the memory- and computation-intensive pruning process becomes infeasible for recourse-constrained devices in federated learning. To address these challenges, we propose FedTiny, a distributed pruning framework for federated learning that generates specialized tiny models for memory- and computing-constrained devices. We introduce two key modules in FedTiny to adaptively search coarse- and finer-pruned specialized models to fit deployment scenarios with sparse and cheap local computation. First, an adaptive batch normalization selection module is designed to mitigate biases in pruning caused by the heterogeneity of local data. Second, a lightweight progressive pruning module aims to finer prune the models under strict memory and computational budgets, allowing the pruning policy for each layer to be gradually determined rather than evaluating the overall model structure. The experimental results demonstrate the effectiveness of FedTiny, which outperforms state-of-the-art approaches, particularly when compressing deep models to extremely sparse tiny models. FedTiny achieves an accuracy improvement of 2.61% while significantly reducing the computational cost by 95.91% and the memory footprint by 94.01% compared to state-of-the-art methods.

CVAug 21, 2023
DOMINO++: Domain-aware Loss Regularization for Deep Learning Generalizability

Skylar E. Stolte, Kyle Volle, Aprinda Indahlastari et al.

Out-of-distribution (OOD) generalization poses a serious challenge for modern deep learning (DL). OOD data consists of test data that is significantly different from the model's training data. DL models that perform well on in-domain test data could struggle on OOD data. Overcoming this discrepancy is essential to the reliable deployment of DL. Proper model calibration decreases the number of spurious connections that are made between model features and class outputs. Hence, calibrated DL can improve OOD generalization by only learning features that are truly indicative of the respective classes. Previous work proposed domain-aware model calibration (DOMINO) to improve DL calibration, but it lacks designs for model generalizability to OOD data. In this work, we propose DOMINO++, a dual-guidance and dynamic domain-aware loss regularization focused on OOD generalizability. DOMINO++ integrates expert-guided and data-guided knowledge in its regularization. Unlike DOMINO which imposed a fixed scaling and regularization rate, DOMINO++ designs a dynamic scaling factor and an adaptive regularization rate. Comprehensive evaluations compare DOMINO++ with DOMINO and the baseline model for head tissue segmentation from magnetic resonance images (MRIs) on OOD data. The OOD data consists of synthetic noisy and rotated datasets, as well as real data using a different MRI scanner from a separate site. DOMINO++'s superior performance demonstrates its potential to improve the trustworthy deployment of DL on real clinical data.

LGFeb 6, 2023
LAVA: Granular Neuron-Level Explainable AI for Alzheimer's Disease Assessment from Fundus Images

Nooshin Yousefzadeh, Charlie Tran, Adolfo Ramirez-Zamora et al.

Alzheimer's Disease (AD) is a progressive neurodegenerative disease and the leading cause of dementia. Early diagnosis is critical for patients to benefit from potential intervention and treatment. The retina has been hypothesized as a diagnostic site for AD detection owing to its anatomical connection with the brain. Developed AI models for this purpose have yet to provide a rational explanation about the decision and neither infer the stage of disease's progression. Along this direction, we propose a novel model-agnostic explainable-AI framework, called Granular Neuron-level Explainer (LAVA), an interpretation prototype that probes into intermediate layers of the Convolutional Neural Network (CNN) models to assess the AD continuum directly from the retinal imaging without longitudinal or clinical evaluation. This method is applied to validate the retinal vasculature as a biomarker and diagnostic modality for Alzheimer's Disease (AD) evaluation. UK Biobank cognitive tests and vascular morphological features suggest LAVA shows strong promise and effectiveness in identifying AD stages across the progression continuum.

NADec 8, 2017
Joint image edge reconstruction and its application in multi-contrast MRI

Yunmei Chen, Ruogu Fang, Xiaojing Ye

We propose a new joint image reconstruction method by recovering edge directly from observed data. More specifically, we reformulate joint image reconstruction with vectorial total-variation regularization as an $l_1$ minimization problem of the Jacobian of the underlying multi-modality or multi-contrast images. Derivation of data fidelity for Jacobian and transformation of noise distribution are also detailed. The new minimization problem yields an optimal $O(1/k^2)$ convergence rate, where $k$ is the iteration number, and the per-iteration cost is low thanks to the close-form matrix-valued shrinkage. We conducted numerical tests on a number multi-contrast magnetic resonance image (MRI) datasets, which show that the proposed method significantly improves reconstruction efficiency and accuracy compared to the state-of-the-arts.

LGFeb 13, 2023
Deep Learning Predicts Prevalent and Incident Parkinson's Disease From UK Biobank Fundus Imaging

Charlie Tran, Kai Shen, Kang Liu et al.

Parkinson's disease is the world's fastest-growing neurological disorder. Research to elucidate the mechanisms of Parkinson's disease and automate diagnostics would greatly improve the treatment of patients with Parkinson's disease. Current diagnostic methods are expensive and have limited availability. Considering the insidious and preclinical onset and progression of the disease, a desirable screening should be diagnostically accurate even before the onset of symptoms to allow medical interventions. We highlight retinal fundus imaging, often termed a window to the brain, as a diagnostic screening modality for Parkinson's disease. We conducted a systematic evaluation of conventional machine learning and deep learning techniques to classify Parkinson's disease from UK Biobank fundus imaging. Our results show that Parkinson's disease individuals can be differentiated from age and gender-matched healthy subjects with an Area Under the Curve (AUC) of 0.77. This accuracy is maintained when predicting either prevalent or incident Parkinson's disease. Explainability and trustworthiness are enhanced by visual attribution maps of localized biomarkers and quantified metrics of model robustness to data perturbations.

6.0CVMay 1
Prediction of Alzheimer's Disease Risk Factors from Retinal Images via Deep Learning: Development and Validation of Biologically Relevant Morphological Associations in the UK Biobank

Seowung Leem, Yunchao Yang, Adam J. Woods et al.

The systemic, metabolic, lifestyle factors have established associations with Alzheimer's Disease (AD) through epidemiologic and AD-specific biomarker studies. Whether colored fundus photography (CFP) contains retinal structural signatures corresponding to these AD-related risk domains remains unclear. To determine whether deep learning (DL) models can predict 12 AD-related risk factors from CFP and to characterize the retinal structures underlying these predictions, thereby assessing whether CFP reflects pathways to AD vulnerability. Using UK Biobank CFPs, DL models were trained using 62,876 images from 44,501 unique participants to predict 12 factors linked to AD incidence: 6 categorical (sex, smoking, sleeplessness, economic status, alcohol use, depression) and 6 continuous (age, age at completing education, BMI, systolic, diastolic blood pressure, HbA1c). Model performance, model saliency, and saliency-derived scores (CAM-Score) were evaluated and compared to retinal morphometry. The scores were also compared between incident-AD cases (average 8.55 years before onset) and matched controls. Performance of DL ranged from AUROC= 0.5654-0.9480 for categorical and R2=-0.0291-0.7620 for continuous factors, outperforming most of the morphometry-machine learning models. Saliency-based score consistently highlighted biologically meaningful regions, particularly the optic nerve head and retinal vasculature. It also aligned with present morphometric variations. Several saliency-based scores differed significantly between incident AD and matched controls, suggesting potential overlap between retinal correlates of risk factors and preclinical AD-associated changes. CFP encodes retinal signatures linked to AD risk factors. Although not diagnostic, DL-derived retinal representations may uncover biologically meaningful risk-related structural changes mirroring the potential AD vulnerability.

75.6AIMar 24
Cerebra: A Multidisciplinary AI Board for Multimodal Dementia Characterization and Risk Assessment

Sheng Liu, Long Chen, Zeyun Zhao et al.

Modern clinical practice increasingly depends on reasoning over heterogeneous, evolving, and incomplete patient data. Although recent advances in multimodal foundation models have improved performance on various clinical tasks, most existing models remain static, opaque, and poorly aligned with real-world clinical workflows. We present Cerebra, an interactive multi-agent AI team that coordinates specialized agents for EHR, clinical notes, and medical imaging analysis. These outputs are synthesized into a clinician-facing dashboard that combines visual analytics with a conversational interface, enabling clinicians to interrogate predictions and contextualize risk at the point of care. Cerebra supports privacy-preserving deployment by operating on structured representations and remains robust when modalities are incomplete. We evaluated Cerebra using a massive multi-institutional dataset spanning 3 million patients from four independent healthcare systems. Cerebra consistently outperformed both state-of-the-art single-modality models and large multimodal language model baselines. In dementia risk prediction, it achieved AUROCs up to 0.80, compared with 0.74 for the strongest single-modality model and 0.68 for language model baselines. For dementia diagnosis, it achieved an AUROC of 0.86, and for survival prediction, a C-index of 0.81. In a reader study with experienced physicians, Cerebra significantly improved expert performance, increasing accuracy by 17.5 percentage points in prospective dementia risk estimation. These results demonstrate Cerebra's potential for interpretable, robust decision support in clinical care.

65.6IVMay 13
A General Bézier Tree Encoding Counterfactual Framework for Retinal-Vessel-Mediated Disease Analysis

Tan Su, Ethan Elio Meidinger, Lin Gu et al.

The geometry of the retinal vessel is a key biomarker of vascular diseases, yet clinical evidence remains primarily observational. Existing generative counterfactuals intervene only at the image-level disease label, failing to isolate explicit anatomical structure. To address this limitation, we propose the Bézier Tree Encoding Counterfactual Framework (BTECF). By abstracting vascular networks into interconnected cubic-Bézier segments, BTECF establishes a disease-agnostic representation in which structural topology is explicitly preserved and atomically perturbable. Coupling this encoding with a diffusion-based generator enables parameter-level do-interventions on explicit geometric axes (e.g., tortuosity, caliber) while preserving background fundus textures. We validate BTECF on diabetic retinopathy, together with independent cohorts for ischemic stroke and Alzheimer's disease. Isolated counterfactual interventions produce dose-responsive shifts in classifier predictions; a matched pixel-drop control attenuates this response by an order of magnitude or more, ruling out out-of-distribution generation artifacts. By enforcing causal isolation between vessel topology and pixel-level confounders, BTECF provides a unified generative paradigm for hypothesis verification across systemic diseases. To support reproducibility, the code will be publicly released upon acceptance.

48.7CVApr 20
REVEAL: Multimodal Vision-Language Alignment of Retinal Morphometry and Clinical Risks for Incident AD and Dementia Prediction

Seowung Leem, Lin Gu, Chenyu You et al.

The retina provides a unique, noninvasive window into Alzheimer's disease (AD) and dementia, capturing early structural changes through morphometric features, while systemic and lifestyle risk factors reflect well-established contributors to disease susceptibility long before clinical symptom onset. However, current retinal analysis frameworks typically model imaging and risk factors separately, limiting their ability to capture joint multimodal patterns critical for early risk prediction. Moreover, existing methods rarely incorporate mechanisms to organize or align patients with similar retinal and clinical characteristics, constraining the learning of coherent cross-modal associations. To address these limitations, we introduce REVEAL (REtinal-risk Vision-Language Early Alzheimer's Learning), a framework that aligns color fundus photographs with individualized disease-specific risk profiles for predicting incident AD and dementia, on average 8 years before diagnosis (range: 1-11 years). Because real-world risk factors are structured questionnaire data, we translate them into clinically interpretable narratives compatible with pretrained vision-language models (VLMs). We further propose a group-aware contrastive learning (GACL) strategy that clusters patients with similar retinal morphometry and risk factors as positive pairs, strengthening multimodal alignment. This unified representation learning framework substantially outperforms state-of-the-art retinal imaging models paired with clinical text encoders, as well as general-purpose VLMs, demonstrating the value of jointly modeling retinal biomarkers and clinical risk factors. By providing a generalizable and noninvasive approach for early AD and dementia risk stratification, REVEAL has the potential to enable earlier intervention and improve preventive care at the population level.

53.6AIApr 15
From edges to meaning: Semantic line sketches as a cognitive scaffold for ancient pictograph invention

Seowung Leem, Lin Gu, Ruogu Fang

Humans readily recognize objects from sparse line drawings, a capacity that appears early in development and persists across cultures, suggesting neural rather than purely learned origins. Yet the computational mechanism by which the brain transforms high-level semantic knowledge into low-level visual symbols remains poorly understood. Here we propose that ancient pictographic writing emerged from the brain's intrinsic tendency to compress visual input into stable, boundary-based abstractions. We construct a biologically inspired digital twin of the visual hierarchy that encodes an image into low-level features, generates a contour sketch, and iteratively refines it through top-down feedback guided by semantic representations, mirroring the feedforward and recurrent architecture of the human visual cortex. The resulting symbols bear striking structural resemblance to early pictographs across culturally distant writing systems, including Egyptian hieroglyphs, Chinese oracle bone characters, and proto-cuneiform, and offer candidate interpretations for undeciphered scripts. Our findings support a neuro-computational origin of pictographic writing and establish a framework in which AI can recapitulate the cognitive processes by which humans first externalized perception into symbols.

CVJun 25, 2025Code
OTSurv: A Novel Multiple Instance Learning Framework for Survival Prediction with Heterogeneity-aware Optimal Transport

Qin Ren, Yifan Wang, Ruogu Fang et al.

Survival prediction using whole slide images (WSIs) can be formulated as a multiple instance learning (MIL) problem. However, existing MIL methods often fail to explicitly capture pathological heterogeneity within WSIs, both globally -- through long-tailed morphological distributions, and locally through -- tile-level prediction uncertainty. Optimal transport (OT) provides a principled way of modeling such heterogeneity by incorporating marginal distribution constraints. Building on this insight, we propose OTSurv, a novel MIL framework from an optimal transport perspective. Specifically, OTSurv formulates survival predictions as a heterogeneity-aware OT problem with two constraints: (1) global long-tail constraint that models prior morphological distributions to avert both mode collapse and excessive uniformity by regulating transport mass allocation, and (2) local uncertainty-aware constraint that prioritizes high-confidence patches while suppressing noise by progressively raising the total transport mass. We then recast the initial OT problem, augmented by these constraints, into an unbalanced OT formulation that can be solved with an efficient, hardware-friendly matrix scaling algorithm. Empirically, OTSurv sets new state-of-the-art results across six popular benchmarks, achieving an absolute 3.6% improvement in average C-index. In addition, OTSurv achieves statistical significance in log-rank tests and offers high interpretability, making it a powerful tool for survival prediction in digital pathology. Our codes are available at https://github.com/Y-Research-SBU/OTSurv.

IVJun 14, 2024Code
BrainSegFounder: Towards 3D Foundation Models for Neuroimage Segmentation

Joseph Cox, Peng Liu, Skylar E. Stolte et al.

The burgeoning field of brain health research increasingly leverages artificial intelligence (AI) to interpret and analyze neurological data. This study introduces a novel approach towards the creation of medical foundation models by integrating a large-scale multi-modal magnetic resonance imaging (MRI) dataset derived from 41,400 participants in its own. Our method involves a novel two-stage pretraining approach using vision transformers. The first stage is dedicated to encoding anatomical structures in generally healthy brains, identifying key features such as shapes and sizes of different brain regions. The second stage concentrates on spatial information, encompassing aspects like location and the relative positioning of brain structures. We rigorously evaluate our model, BrainFounder, using the Brain Tumor Segmentation (BraTS) challenge and Anatomical Tracings of Lesions After Stroke v2.0 (ATLAS v2.0) datasets. BrainFounder demonstrates a significant performance gain, surpassing the achievements of the previous winning solutions using fully supervised learning. Our findings underscore the impact of scaling up both the complexity of the model and the volume of unlabeled training data derived from generally healthy brains, which enhances the accuracy and predictive capabilities of the model in complex neuroimaging tasks with MRI. The implications of this research provide transformative insights and practical applications in healthcare and make substantial steps towards the creation of foundation models for Medical AI. Our pretrained models and training code can be found at https://github.com/lab-smile/GatorBrain.

IVJan 6, 2020Code
Regression and Learning with Pixel-wise Attention for Retinal Fundus Glaucoma Segmentation and Detection

Peng Liu, Ruogu Fang

Observing retinal fundus images by an ophthalmologist is a major diagnosis approach for glaucoma. However, it is still difficult to distinguish the features of the lesion solely through manual observations, especially, in glaucoma early phase. In this paper, we present two deep learning-based automated algorithms for glaucoma detection and optic disc and cup segmentation. We utilize the attention mechanism to learn pixel-wise features for accurate prediction. In particular, we present two convolutional neural networks that can focus on learning various pixel-wise level features. In addition, we develop several attention strategies to guide the networks to learn the important features that have a major impact on prediction accuracy. We evaluate our methods on the validation dataset and The proposed both tasks' solutions can achieve impressive results and outperform current state-of-the-art methods. \textit{The code is available at \url{https://github.com/cswin/RLPA}}.

CVOct 19, 2019Code
Image Restoration Using Deep Regulated Convolutional Networks

Peng Liu, Xiaoxiao Zhou, Yangjunyi Li et al.

While the depth of convolutional neural networks has attracted substantial attention in the deep learning research, the width of these networks has recently received greater interest. The width of networks, defined as the size of the receptive fields and the density of the channels, has demonstrated crucial importance in low-level vision tasks such as image denoising and restoration. However, the limited generalization ability, due to the increased width of networks, creates a bottleneck in designing wider networks. In this paper, we propose the Deep Regulated Convolutional Network (RC-Net), a deep network composed of regulated sub-network blocks cascaded by skip-connections, to overcome this bottleneck. Specifically, the Regulated Convolution block (RC-block), featured by a combination of large and small convolution filters, balances the effectiveness of prominent feature extraction and the generalization ability of the network. RC-Nets have several compelling advantages: they embrace diversified features through large-small filter combinations, alleviate the hazy boundary and blurred details in image denoising and super-resolution problems, and stabilize the learning process. Our proposed RC-Nets outperform state-of-the-art approaches with significant performance gains in various image restoration tasks while demonstrating promising generalization ability. The code is available at https://github.com/cswin/RC-Nets.

IVOct 18, 2019Code
SDCNet: Smoothed Dense-Convolution Network for Restoring Low-Dose Cerebral CT Perfusion

Peng Liu, Ruogu Fang

With substantial public concerns on potential cancer risks and health hazards caused by the accumulated radiation exposure in medical imaging, reducing radiation dose in X-ray based medical imaging such as Computed Tomography Perfusion (CTP) has raised significant research interests. In this paper, we embrace the deep Convolutional Neural Networks (CNN) based approaches and introduce Smoothed Dense-Convolution Neural Network (SDCNet) to recover high-dose quality CTP images from low-dose ones. SDCNet is composed of sub-network blocks cascaded by skip-connections to infer the noise (differentials) from paired low/high-dose CT scans. SDCNet can effectively remove the noise in real low-dose CT scans and enhance the quality of medical images. We evaluate the proposed architecture on thousands of CT perfusion frames for both reconstructed image denoising and perfusion map quantification including cerebral blood flow (CBF) and cerebral blood volume (CBV). SDCNet achieves high performance in both visual and quantitative results with promising computational efficiency, comparing favorably with state-of-the-art approaches. \textit{The code is available at \url{https://github.com/cswin/RC-Nets}}.

IVOct 16, 2019Code
CFEA: Collaborative Feature Ensembling Adaptation for Domain Adaptation in Unsupervised Optic Disc and Cup Segmentation

Peng Liu, Bin Kong, Zhongyu Li et al.

Recently, deep neural networks have demonstrated comparable and even better performance with board-certified ophthalmologists in well-annotated datasets. However, the diversity of retinal imaging devices poses a significant challenge: domain shift, which leads to performance degradation when applying the deep learning models to new testing domains. In this paper, we propose a novel unsupervised domain adaptation framework, called Collaborative Feature Ensembling Adaptation (CFEA), to effectively overcome this challenge. Our proposed CFEA is an interactive paradigm which presents an exquisite of collaborative adaptation through both adversarial learning and ensembling weights. In particular, we simultaneously achieve domain-invariance and maintain an exponential moving average of the historical predictions, which achieves a better prediction for the unlabeled data, via ensembling weights during training. Without annotating any sample from the target domain, multiple adversarial losses in encoder and decoder layers guide the extraction of domain-invariant features to confuse the domain classifier and meanwhile benefit the ensembling of smoothing weights. Comprehensive experimental results demonstrate that our CFEA model can overcome performance degradation and outperform the state-of-the-art methods in segmenting retinal optic disc and cup from fundus images. \textit{Code is available at \url{https://github.com/cswin/AWC}}.

CVJul 28, 2017Code
Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising

Peng Liu, Ruogu Fang

In this work, we explore an innovative strategy for image denoising by using convolutional neural networks (CNN) to learn pixel-distribution from noisy data. By increasing CNN's width with large reception fields and more channels in each layer, CNNs can reveal the ability to learn pixel-distribution, which is a prior existing in many different types of noise. The key to our approach is a discovery that wider CNNs tends to learn the pixel-distribution features, which provides the probability of that inference-mapping primarily relies on the priors instead of deeper CNNs with more stacked nonlinear layers. We evaluate our work: Wide inference Networks (WIN) on additive white Gaussian noise (AWGN) and demonstrate that by learning the pixel-distribution in images, WIN-based network consistently achieves significantly better performance than current state-of-the-art deep CNN-based methods in both quantitative and visual evaluations. \textit{Code and models are available at \url{https://github.com/cswin/WIN}}.

CVJul 17, 2017Code
Wide Inference Network for Image Denoising via Learning Pixel-distribution Prior

Peng Liu, Ruogu Fang

We explore an innovative strategy for image denoising by using convolutional neural networks (CNN) to learn similar pixel-distribution features from noisy images. Many types of image noise follow a certain pixel-distribution in common, such as additive white Gaussian noise (AWGN). By increasing CNN's width with larger reception fields and more channels in each layer, CNNs can reveal the ability to extract more accurate pixel-distribution features. The key to our approach is a discovery that wider CNNs with more convolutions tend to learn the similar pixel-distribution features, which reveals a new strategy to solve low-level vision problems effectively that the inference mapping primarily relies on the priors behind the noise property instead of deeper CNNs with more stacked nonlinear layers. We evaluate our work, Wide inference Networks (WIN), on AWGN and demonstrate that by learning pixel-distribution features from images, WIN-based network consistently achieves significantly better performance than current state-of-the-art deep CNN-based methods in both quantitative and visual evaluations. \textit{Code and models are available at \url{https://github.com/cswin/WIN}}.

13.4CVMay 4
SAIL: Structure-Aware Interpretable Learning for Anatomy-Aligned Post-hoc Explanations in OCT

Tienyu Chang, Tianhao Li, Ruogu Fang et al.

Optical coherence tomography (OCT), a commonly used retinal imaging modality, plays a central role in retinal disease diagnosis by providing high-resolution visualization of retinal layers. While deep learning (DL) has achieved expert-level accuracy in OCT-based retinal disease detection, its "black box" nature poses challenges for clinical adoption, where explainability is essential for clinical trust and regulatory approval. Existing post-hoc explainable AI (XAI) methods often struggle to delineate fine-grained lesion structures, respect anatomical boundaries, or suppress noise, limiting the trustworthiness of their explanations. To bridge these gaps, we propose a Structure-Aware Interpretable Learning (SAIL) framework that integrates retinal anatomical priors at the representation level and couples them with semantic features via a fusion design. Without modifying standard post-hoc explainability methods, this representation yields sharper and more anatomically aligned attribution maps. Comprehensive experiments on diverse OCT datasets demonstrate that our structure-aware method consistently enhances interpretability, producing clinically meaningful and anatomy-aware explanations. Ablation studies further show that strong interpretability requires both structural priors and semantic features, and that properly fusing the two is critical to achieve the best explanation quality. Together, these results highlight structure-aware representations as a key step toward reliable explainability in OCT.

46.0CVMay 4
OphMAE: Bridging Volumetric and Planar Imaging with a Foundation Model for Adaptive Ophthalmological Diagnosis

Tienyu Chang, Zhen Chen, Renjie Liang et al.

The advent of foundation models has heralded a new era in medical artificial intelligence (AI), enabling the extraction of generalizable representations from large-scale unlabeled datasets. However, current ophthalmic AI paradigms are predominantly constrained to single-modality inference, thereby creating a dissonance with clinical practice where diagnosis relies on the synthesis of complementary imaging modalities. Furthermore, the deployment of high-performance AI in resource-limited settings is frequently impeded by the unavailability of advanced three-dimensional imaging hardware. Here, we present the Ophthalmic multimodal Masked Autoencoder (OphMAE), a multi-imaging foundation model engineered to synergize the volumetric depth of 3D Optical Coherence Tomography (OCT) with the planar context of 2D en face OCT. By implementing a novel cross-modal fusion architecture and a unique adaptive inference mechanism, OphMAE was pre-trained on a massive dataset with of 183,875 paired OCT images derived from 32,765 patients. In a rigorous benchmark encompassing 17 diverse diagnostic tasks with 48,340 paired OCT images from 8,191 patients, the model demonstrated state-of-the-art performance, achieving an Area Under the Curve (AUC) of 96.9% for Age-related Macular Degeneration (AMD) and 97.2% for Diabetic Macular Edema (DME), consistently surpassing existing single-modal and multimodal foundation models. Crucially, OphMAE exhibits robust engineering adaptability: it maintains high diagnostic accuracy, such as 93.7\% AUC for AMD, even when restricted to single-modality 2D inputs, and demonstrates exceptional data efficiency by retaining 95.7% AUC with as few as 500 labeled samples. This work establishes a scalable and adaptable framework for ophthalmic AI, ensuring robust performance across different tasks.

QMDec 13, 2023
Morphological Profiling for Drug Discovery in the Era of Deep Learning

Qiaosi Tang, Ranjala Ratnayake, Gustavo Seabra et al.

Morphological profiling is a valuable tool in phenotypic drug discovery. The advent of high-throughput automated imaging has enabled the capturing of a wide range of morphological features of cells or organisms in response to perturbations at the single-cell resolution. Concurrently, significant advances in machine learning and deep learning, especially in computer vision, have led to substantial improvements in analyzing large-scale high-content images at high-throughput. These efforts have facilitated understanding of compound mechanism-of-action (MOA), drug repurposing, characterization of cell morphodynamics under perturbation, and ultimately contributing to the development of novel therapeutics. In this review, we provide a comprehensive overview of the recent advances in the field of morphological profiling. We summarize the image profiling analysis workflow, survey a broad spectrum of analysis strategies encompassing feature engineering- and deep learning-based approaches, and introduce publicly available benchmark datasets. We place a particular emphasis on the application of deep learning in this pipeline, covering cell segmentation, image representation learning, and multimodal learning. Additionally, we illuminate the application of morphological profiling in phenotypic drug discovery and highlight potential challenges and opportunities in this field.

QMMay 27, 2025
Learning optimal treatment strategies for intraoperative hypotension using deep reinforcement learning

Esra Adiyeke, Tianqi Liu, Venkata Sai Dheeraj Naganaboina et al.

Traditional methods of surgical decision making heavily rely on human experience and prompt actions, which are variable. A data-driven system generating treatment recommendations based on patient states can be a substantial asset in perioperative decision-making, as in cases of intraoperative hypotension, for which suboptimal management is associated with acute kidney injury (AKI), a common and morbid postoperative complication. We developed a Reinforcement Learning (RL) model to recommend optimum dose of intravenous (IV) fluid and vasopressors during surgery to avoid intraoperative hypotension and postoperative AKI. We retrospectively analyzed 50,021 surgeries from 42,547 adult patients who underwent major surgery at a quaternary care hospital between June 2014 and September 2020. Of these, 34,186 surgeries were used for model training and 15,835 surgeries were reserved for testing. We developed a Deep Q-Networks based RL model using 16 variables including intraoperative physiologic time series, total dose of IV fluid and vasopressors extracted for every 15-minute epoch. The model replicated 69% of physician's decisions for the dosage of vasopressors and proposed higher or lower dosage of vasopressors than received in 10% and 21% of the treatments, respectively. In terms of IV fluids, the model's recommendations were within 0.05 ml/kg/15 min of the actual dose in 41% of the cases, with higher or lower doses recommended for 27% and 32% of the treatments, respectively. The model resulted in a higher estimated policy value compared to the physicians' actual treatments, as well as random and zero-drug policies. AKI prevalence was the lowest in patients receiving medication dosages that aligned with model's decisions. Our findings suggest that implementation of the model's policy has the potential to reduce postoperative AKI and improve other outcomes driven by intraoperative hypotension.

NCAug 24, 2025
Association of Timing and Duration of Moderate-to-Vigorous Physical Activity with Cognitive Function and Brain Aging: A Population-Based Study Using the UK Biobank

Wasif Khan, Lin Gu, Noah Hammarlund et al.

Physical activity is a modifiable lifestyle factor with potential to support cognitive resilience. However, the association of moderate-to-vigorous physical activity (MVPA) intensity, and timing, with cognitive function and region-specific brain structure remain poorly understood. We analyzed data from 45,892 UK Biobank participants aged 60 years and older with valid wrist-worn accelerometer data, cognitive testing, and structural brain MRI. MVPA was measured both continuously (mins per week) and categorically (thresholded using >=150 min/week based on WHO guidelines). Associations with cognitive performance and regional brain volumes were evaluated using multivariable linear models adjusted for demographic, socioeconomic, and health-related covariates. We conducted secondary analyses on MVPA timing and subgroup effects. Higher MVPA was associated with better performance across cognitive domains, including reasoning, memory, executive function, and processing speed. These associations persisted in fully adjusted models and were higher among participants meeting WHO guidelines. Greater MVPA was also associated with subcortical brain regions (caudate, putamen, pallidum, thalamus), as well as regional gray matter volumes involved in emotion, working memory, and perceptual processing. Secondary analyses showed that MVPA at any time of day was associated with cognitive functions and brain volume particularly in the midday-afternoon and evening. Sensitivity analysis shows consistent findings across subgroups, with evidence of dose-response relationships. Higher MVPA is associated with preserved brain structure and enhanced cognitive function in later life. Public health strategies to increase MVPA may support healthy cognitive aging and generate substantial economic benefits, with global gains projected to reach USD 760 billion annually by 2050.

CVAug 20, 2025
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering

Shanlin Sun, Yifan Wang, Hanwen Zhang et al.

While multi-step diffusion models have advanced both forward and inverse rendering, existing approaches often treat these problems independently, leading to cycle inconsistency and slow inference speed. In this work, we present Ouroboros, a framework composed of two single-step diffusion models that handle forward and inverse rendering with mutual reinforcement. Our approach extends intrinsic decomposition to both indoor and outdoor scenes and introduces a cycle consistency mechanism that ensures coherence between forward and inverse rendering outputs. Experimental results demonstrate state-of-the-art performance across diverse scenes while achieving substantially faster inference speed compared to other diffusion-based methods. We also demonstrate that Ouroboros can transfer to video decomposition in a training-free manner, reducing temporal inconsistency in video sequences while maintaining high-quality per-frame inverse rendering.

NCAug 7, 2025
Revealing Neurocognitive and Behavioral Patterns by Unsupervised Manifold Learning from Dynamic Brain Data

Zixia Zhou, Junyan Liu, Wei Emma Wu et al.

Dynamic brain data, teeming with biological and functional insights, are becoming increasingly accessible through advanced measurements, providing a gateway to understanding the inner workings of the brain in living subjects. However, the vast size and intricate complexity of the data also pose a daunting challenge in reliably extracting meaningful information across various data sources. This paper introduces a generalizable unsupervised deep manifold learning for exploration of neurocognitive and behavioral patterns. Unlike existing methods that extract patterns directly from the input data as in the existing methods, the proposed Brain-dynamic Convolutional-Network-based Embedding (BCNE) seeks to capture the brain-state trajectories by deciphering the temporospatial correlations within the data and subsequently applying manifold learning to this correlative representation. The performance of BCNE is showcased through the analysis of several important dynamic brain datasets. The results, both visual and quantitative, reveal a diverse array of intriguing and interpretable patterns. BCNE effectively delineates scene transitions, underscores the involvement of different brain regions in memory and narrative processing, distinguishes various stages of dynamic learning processes, and identifies differences between active and passive behaviors. BCNE provides an effective tool for exploring general neuroscience inquiries or individual-specific patterns.

TOMay 12, 2025
Physiology-Informed Generative Multi-Task Network for Contrast-Free CT Perfusion

Wasif Khan, Kyle B. See, Simon Kato et al.

Perfusion imaging is extensively utilized to assess hemodynamic status and tissue perfusion in various organs. Computed tomography perfusion (CTP) imaging plays a key role in the early assessment and planning of stroke treatment. While CTP provides essential perfusion parameters to identify abnormal blood flow in the brain, the use of contrast agents in CTP can lead to allergic reactions and adverse side effects, along with costing USD 4.9 billion worldwide in 2022. To address these challenges, we propose a novel deep learning framework called Multitask Automated Generation of Intermodal CT perfusion maps (MAGIC). This framework combines generative artificial intelligence and physiological information to map non-contrast computed tomography (CT) imaging to multiple contrast-free CTP imaging maps. We demonstrate enhanced image fidelity by incorporating physiological characteristics into the loss terms. Our network was trained and validated using CT image data from patients referred for stroke at UF Health and demonstrated robustness to abnormalities in brain perfusion activity. A double-blinded study was conducted involving seven experienced neuroradiologists and vascular neurologists. This study validated MAGIC's visual quality and diagnostic accuracy showing favorable performance compared to clinical perfusion imaging with intravenous contrast injection. Overall, MAGIC holds great promise in revolutionizing healthcare by offering contrast-free, cost-effective, and rapid perfusion imaging.

CVDec 27, 2024
Paleoinspired Vision: From Exploring Colour Vision Evolution to Inspiring Camera Design

Junjie Zhang, Zhimin Zong, Lin Gu et al.

The evolution of colour vision is captivating, as it reveals the adaptive strategies of extinct species while simultaneously inspiring innovations in modern imaging technology. In this study, we present a simplified model of visual transduction in the retina, introducing a novel opsin layer. We quantify evolutionary pressures by measuring machine vision recognition accuracy on colour images shaped by specific opsins. Building on this, we develop an evolutionary conservation optimisation algorithm to reconstruct the spectral sensitivity of opsins, enabling mutation-driven adaptations to to more effectively spot fruits or predators. This model condenses millions of years of evolution within seconds on GPU, providing an experimental framework to test long-standing hypotheses in evolutionary biology , such as vision of early mammals, primate trichromacy from gene duplication, retention of colour blindness, blue-shift of fish rod and multiple rod opsins with bioluminescence. Moreover, the model enables speculative explorations of hypothetical species, such as organisms with eyes adapted to the conditions on Mars. Our findings suggest a minimalist yet effective approach to task-specific camera filter design, optimising the spectral response function to meet application-driven demands. The code will be made publicly available upon acceptance.

LGJun 15, 2024
A Comprehensive Survey of Foundation Models in Medicine

Wasif Khan, Seowung Leem, Kyle B. See et al.

Foundation models (FMs) are large-scale deep learning models trained on massive datasets, often using self-supervised learning techniques. These models serve as a versatile base for a wide range of downstream tasks, including those in medicine and healthcare. FMs have demonstrated remarkable success across multiple healthcare domains. However, existing surveys in this field do not comprehensively cover all areas where FMs have made significant strides. In this survey, we present a comprehensive review of FMs in medicine, focusing on their evolution, learning strategies, flagship models, applications, and associated challenges. We examine how prominent FMs, such as the BERT and GPT families, are transforming various aspects of healthcare, including clinical large language models, medical image analysis, and omics research. Additionally, we provide a detailed taxonomy of FM-enabled healthcare applications, spanning clinical natural language processing, medical computer vision, graph learning, and other biology- and omics- related tasks. Despite the transformative potentials of FMs, they also pose unique challenges. This survey delves into these challenges and highlights open research questions and lessons learned to guide researchers and practitioners. Our goal is to provide valuable insights into the capabilities of FMs in health, facilitating responsible deployment and mitigating associated risks.

IVOct 5, 2021
CADA: Multi-scale Collaborative Adversarial Domain Adaptation for Unsupervised Optic Disc and Cup Segmentation

Peng Liu, Charlie T. Tran, Bin Kong et al.

The diversity of retinal imaging devices poses a significant challenge: domain shift, which leads to performance degradation when applying the deep learning models trained on one domain to new testing domains. In this paper, we propose a multi-scale input along with multiple domain adaptors applied hierarchically in both feature and output spaces. The proposed training strategy and novel unsupervised domain adaptation framework, called Collaborative Adversarial Domain Adaptation (CADA), can effectively overcome the challenge. Multi-scale inputs can reduce the information loss due to the pooling layers used in the network for feature extraction, while our proposed CADA is an interactive paradigm that presents an exquisite collaborative adaptation through both adversarial learning and ensembling weights at different network layers. In particular, to produce a better prediction for the unlabeled target domain data, we simultaneously achieve domain invariance and model generalizability via adversarial learning at multi-scale outputs from different levels of network layers and maintaining an exponential moving average (EMA) of the historical weights during training. Without annotating any sample from the target domain, multiple adversarial losses in encoder and decoder layers guide the extraction of domain-invariant features to confuse the domain classifier. Meanwhile, the ensembling of weights via EMA reduces the uncertainty of adapting multiple discriminator learning. Comprehensive experimental results demonstrate that our CADA model incorporating multi-scale input training can overcome performance degradation and outperform state-of-the-art domain adaptation methods in segmenting retinal optic disc and cup from fundus images stemming from the REFUGE, Drishti-GS, and Rim-One-r3 datasets.

IVOct 20, 2019
KRNET: Image Denoising with Kernel Regulation Network

Peng Liu, Xiaoxiao Zhou, Junyiyang Li et al.

One popular strategy for image denoising is to design a generalized regularization term that is capable of exploring the implicit prior underlying data observation. Convolutional neural networks (CNN) have shown the powerful capability to learn image prior information through a stack of layers defined by a combination of kernels (filters) on the input. However, existing CNN-based methods mainly focus on synthetic gray-scale images. These methods still exhibit low performance when tackling multi-channel color image denoising. In this paper, we optimize CNN regularization capability by developing a kernel regulation module. In particular, we propose a kernel regulation network-block, referred to as KR-block, by integrating the merits of both large and small kernels, that can effectively estimate features in solving image denoising. We build a deep CNN-based denoiser, referred to as KRNET, via concatenating multiple KR-blocks. We evaluate KRNET on additive white Gaussian noise (AWGN), multi-channel (MC) noise, and realistic noise, where KRNET obtains significant performance gains over state-of-the-art methods across a wide spectrum of noise levels.

CVOct 8, 2019
REFUGE Challenge: A Unified Framework for Evaluating Automated Methods for Glaucoma Assessment from Fundus Photographs

José Ignacio Orlando, Huazhu Fu, João Barbossa Breda et al.

Glaucoma is one of the leading causes of irreversible but preventable blindness in working age populations. Color fundus photography (CFP) is the most cost-effective imaging modality to screen for retinal disorders. However, its application to glaucoma has been limited to the computation of a few related biomarkers such as the vertical cup-to-disc ratio. Deep learning approaches, although widely applied for medical image analysis, have not been extensively used for glaucoma assessment due to the limited size of the available data sets. Furthermore, the lack of a standardize benchmark strategy makes difficult to compare existing methods in a uniform way. In order to overcome these issues we set up the Retinal Fundus Glaucoma Challenge, REFUGE (\url{https://refuge.grand-challenge.org}), held in conjunction with MICCAI 2018. The challenge consisted of two primary tasks, namely optic disc/cup segmentation and glaucoma classification. As part of REFUGE, we have publicly released a data set of 1200 fundus images with ground truth segmentations and clinical glaucoma labels, currently the largest existing one. We have also built an evaluation framework to ease and ensure fairness in the comparison of different models, encouraging the development of novel techniques in the field. 12 teams qualified and participated in the online challenge. This paper summarizes their methods and analyzes their corresponding results. In particular, we observed that two of the top-ranked teams outperformed two human experts in the glaucoma classification task. Furthermore, the segmentation results were in general consistent with the ground truth annotations, with complementary outcomes that can be further exploited by ensembling the results.