CVDec 12, 2022
Efficient Bayesian Uncertainty Estimation for nnU-NetYidong Zhao, Changchun Yang, Artur Schweidtmann et al.
The self-configuring nnU-Net has achieved leading performance in a large range of medical image segmentation challenges. It is widely considered as the model of choice and a strong baseline for medical image segmentation. However, despite its extraordinary performance, nnU-Net does not supply a measure of uncertainty to indicate its possible failure. This can be problematic for large-scale image segmentation applications, where data are heterogeneous and nnU-Net may fail without notice. In this work, we introduce a novel method to estimate nnU-Net uncertainty for medical image segmentation. We propose a highly effective scheme for posterior sampling of weight space for Bayesian uncertainty estimation. Different from previous baseline methods such as Monte Carlo Dropout and mean-field Bayesian Neural Networks, our proposed method does not require a variational architecture and keeps the original nnU-Net architecture intact, thereby preserving its excellent performance and ease of use. Additionally, we boost the segmentation performance over the original nnU-Net via marginalizing multi-modal posterior models. We applied our method on the public ACDC and M&M datasets of cardiac MRI and demonstrated improved uncertainty estimation over a range of baseline methods. The proposed method further strengthens nnU-Net for medical image segmentation in terms of both segmentation accuracy and quality control.
92.8IVMar 26
Colon-Bench: An Agentic Workflow for Scalable Dense Lesion Annotation in Full-Procedure Colonoscopy VideosAbdullah Hamdi, Changchun Yang, Xin Gao
Early screening via colonoscopy is critical for colon cancer prevention, yet developing robust AI systems for this domain is hindered by the lack of densely annotated, long-sequence video datasets. Existing datasets predominantly focus on single-class polyp detection and lack the rich spatial, temporal, and linguistic annotations required to evaluate modern Multimodal Large Language Models (MLLMs). To address this critical gap, we introduce Colon-Bench, generated via a novel multi-stage agentic workflow. Our pipeline seamlessly integrates temporal proposals, bounding-box tracking, AI-driven visual confirmation, and human-in-the-loop review to scalably annotate full-procedure videos. The resulting verified benchmark is unprecedented in scope, encompassing 528 videos, 14 distinct lesion categories (including polyps, ulcers, and bleeding), over 300,000 bounding boxes, 213,000 segmentation masks, and 133,000 words of clinical descriptions. We utilize Colon-Bench to rigorously evaluate state-of-the-art MLLMs across lesion classification, Open-Vocabulary Video Object Segmentation (OV-VOS), and video Visual Question Answering (VQA). The MLLM results demonstrate surprisingly high localization performance in medical domains compared to SAM-3. Finally, we analyze common VQA errors from MLLMs to introduce a novel "colon-skill" prompting strategy, improving zero-shot MLLM performance by up to 9.7% across most MLLMs. The dataset and the code are available at https://abdullahamdi.com/colon-bench .
IVJul 19, 2024
Improving Representation of High-frequency Components for Medical Visual Foundation ModelsYuetan Chu, Yilan Zhang, Zhongyi Han et al.
Foundation models have recently attracted significant attention for their impressive generalizability across diverse downstream tasks. However, these models are demonstrated to exhibit great limitations in representing high-frequency components and fine-grained details. In many medical imaging tasks, the precise representation of such information is crucial due to the inherently intricate anatomical structures, sub-visual features, and complex boundaries involved. Consequently, the limited representation of prevalent foundation models can result in significant performance degradation or even failure in these tasks. To address these challenges, we propose a novel pretraining strategy, named Frequency-advanced Representation Autoencoder (Frepa). Through high-frequency masking and low-frequency perturbation combined with adversarial learning, Frepa encourages the encoder to effectively represent and preserve high-frequency components in the image embeddings. Additionally, we introduce an innovative histogram-equalized image masking strategy, extending the Masked Autoencoder approach beyond ViT to other architectures such as Swin Transformer and convolutional networks. We develop Frepa across nine medical modalities and validate it on 32 downstream tasks for both 2D images and 3D volume data. Without fine-tuning, Frepa can outperform other self-supervised pretraining methods and, in some cases, even surpasses task-specific trained models. This improvement is particularly significant for tasks involving fine-grained details, such as achieving up to a +15% increase in DSC for retina vessel segmentation and a +7% increase in IoU for lung nodule detection. Further experiments quantitatively reveal that Frepa enables superior high-frequency representations and preservation in the embeddings, underscoring its potential for developing more generalized and universal medical image foundation models.
CVNov 30, 2025
Structural Prognostic Event Modeling for Multimodal Cancer Survival AnalysisYilan Zhang, Li Nanbo, Changchun Yang et al.
The integration of histology images and gene profiles has shown great promise for improving survival prediction in cancer. However, current approaches often struggle to model intra- and inter-modal interactions efficiently and effectively due to the high dimensionality and complexity of the inputs. A major challenge is capturing critical prognostic events that, though few, underlie the complexity of the observed inputs and largely determine patient outcomes. These events, manifested as high-level structural signals such as spatial histologic patterns or pathway co-activations, are typically sparse, patient-specific, and unannotated, making them inherently difficult to uncover. To address this, we propose SlotSPE, a slot-based framework for structural prognostic event modeling. Specifically, inspired by the principle of factorial coding, we compress each patient's multimodal inputs into compact, modality-specific sets of mutually distinctive slots using slot attention. By leveraging these slot representations as encodings for prognostic events, our framework enables both efficient and effective modeling of complex intra- and inter-modal interactions, while also facilitating seamless incorporation of biological priors that enhance prognostic relevance. Extensive experiments on ten cancer benchmarks show that SlotSPE outperforms existing methods in 8 out of 10 cohorts, achieving an overall improvement of 2.9%. It remains robust under missing genomic data and delivers markedly improved interpretability through structured event decomposition.
IVAug 16, 2020Code
Deep Learning Enables Robust and Precise Light Focusing on Treatment NeedsChangchun Yang, Hengrong Lan, Fei Gao
If light passes through the body tissues, focusing only on areas where treatment needs, such as tumors, will revolutionize many biomedical imaging and therapy technologies. So how to focus light through deep inhomogeneous tissues overcoming scattering is Holy Grail in biomedical areas. In this paper, we use deep learning to learn and accelerate the process of phase pre-compensation using wavefront shaping. We present an approach (LoftGAN, light only focuses on treatment needs) for learning the relationship between phase domain X and speckle domain Y . Our goal is not just to learn an inverse mapping F:Y->X such that we can know the corresponding X needed for imaging Y like most work, but also to make focusing that is susceptible to disturbances more robust and precise by ensuring that the phase obtained can be forward mapped back to speckle. So we introduce different constraints to enforce F(Y)=X and H(F(Y))=Y with the transmission mapping H:X->Y. Both simulation and physical experiments are performed to investigate the effects of light focusing to demonstrate the effectiveness of our method and comparative experiments prove the crucial improvement of robustness and precision. Codes are available at https://github.com/ChangchunYang/LoftGAN.
CVAug 10, 2020Code
Deep learning for photoacoustic imaging: a surveyChangchun Yang, Hengrong Lan, Feng Gao et al.
Machine learning has been developed dramatically and witnessed a lot of applications in various fields over the past few years. This boom originated in 2009, when a new model emerged, that is, the deep artificial neural network, which began to surpass other established mature models on some important benchmarks. Later, it was widely used in academia and industry. Ranging from image analysis to natural language processing, it fully exerted its magic and now become the state-of-the-art machine learning models. Deep neural networks have great potential in medical imaging technology, medical data analysis, medical diagnosis and other healthcare issues, and is promoted in both pre-clinical and even clinical stages. In this review, we performed an overview of some new developments and challenges in the application of machine learning to medical image analysis, with a special focus on deep learning in photoacoustic imaging. The aim of this review is threefold: (i) introducing deep learning with some important basics, (ii) reviewing recent works that apply deep learning in the entire ecological chain of photoacoustic imaging, from image reconstruction to disease diagnosis, (iii) providing some open source materials and other resources for researchers interested in applying deep learning to photoacoustic imaging.
CVApr 11, 2024
Deep learning-driven pulmonary artery and vein segmentation reveals demography-associated vasculature anatomical differencesYuetan Chu, Gongning Luo, Longxi Zhou et al.
Pulmonary artery-vein segmentation is crucial for disease diagnosis and surgical planning and is traditionally achieved by Computed Tomography Pulmonary Angiography (CTPA). However, concerns regarding adverse health effects from contrast agents used in CTPA have constrained its clinical utility. In contrast, identifying arteries and veins using non-contrast CT, a conventional and low-cost clinical examination routine, has long been considered impossible. Here we propose a High-abundant Pulmonary Artery-vein Segmentation (HiPaS) framework achieving accurate artery-vein segmentation on both non-contrast CT and CTPA across various spatial resolutions. HiPaS first performs spatial normalization on raw CT volumes via a super-resolution module, and then iteratively achieves segmentation results at different branch levels by utilizing the lower-level vessel segmentation as a prior for higher-level vessel segmentation. We trained and validated HiPaS on our established multi-centric dataset comprising 1,073 CT volumes with meticulous manual annotations. Both quantitative experiments and clinical evaluation demonstrated the superior performance of HiPaS, achieving an average dice score of 91.8% and a sensitivity of 98.0%. Further experiments showed the non-inferiority of HiPaS segmentation on non-contrast CT compared to segmentation on CTPA. Employing HiPaS, we have conducted an anatomical study of pulmonary vasculature on 11,784 participants in China (six sites), discovering a new association of pulmonary vessel anatomy with sex, age, and disease states: vessel abundance suggests a significantly higher association with females than males with slightly decreasing with age, and is also influenced by certain diseases, under the controlling of lung volumes.
QMJul 8, 2025
PAST: A multimodal single-cell foundation model for histopathology and spatial transcriptomics in cancerChangchun Yang, Haoyang Li, Yushuai Wu et al.
While pathology foundation models have transformed cancer image analysis, they often lack integration with molecular data at single-cell resolution, limiting their utility for precision oncology. Here, we present PAST, a pan-cancer single-cell foundation model trained on 20 million paired histopathology images and single-cell transcriptomes spanning multiple tumor types and tissue contexts. By jointly encoding cellular morphology and gene expression, PAST learns unified cross-modal representations that capture both spatial and molecular heterogeneity at the cellular level. This approach enables accurate prediction of single-cell gene expression, virtual molecular staining, and multimodal survival analysis directly from routine pathology slides. Across diverse cancers and downstream tasks, PAST consistently exceeds the performance of existing approaches, demonstrating robust generalizability and scalability. Our work establishes a new paradigm for pathology foundation models, providing a versatile tool for high-resolution spatial omics, mechanistic discovery, and precision cancer research.
22.2CVApr 1
Perturb-and-Restore: Simulation-driven Structural Augmentation Framework for Imbalance Chromosomal Anomaly DetectionYilan Zhang, Hanbiao Chen, Changchun Yang et al.
Detecting structural chromosomal abnormalities is crucial for accurate diagnosis and management of genetic disorders. However, collecting sufficient structural abnormality data is extremely challenging and costly in clinical practice, and not all abnormal types can be readily collected. As a result, deep learning approaches face significant performance degradation due to the severe imbalance and scarcity of abnormal chromosome data. To address this challenge, we propose a Perturb-and-Restore (P&R), a simulation-driven structural augmentation framework that effectively alleviates data imbalance in chromosome anomaly detection. The P&R framework comprises two key components: (1) Structure Perturbation and Restoration Simulation, which generates synthetic abnormal chromosomes by perturbing chromosomal banding patterns of normal chromosomes followed by a restoration diffusion network that reconstructs continuous chromosome content and edges, thus eliminating reliance on rare abnormal samples; and (2) Energy-guided Adaptive Sampling, an energy score-based online selection strategy that dynamically prioritizes high-quality synthetic samples by referencing the energy distribution of real samples. To evaluate our method, we construct a comprehensive structural anomaly dataset consisting of over 260,000 chromosome images, including 4,242 abnormal samples spanning 24 categories. Experimental results demonstrate that the P&R framework achieves state-of-the-art (SOTA) performance, surpassing existing methods with an average improvement of 8.92% in sensitivity, 8.89% in precision, and 13.79% in F1-score across all categories.
QMMay 21, 2025
An Inclusive Foundation Model for Generalizable Cytogenetics in Precision OncologyChangchun Yang, Weiqian Dai, Yilan Zhang et al.
Chromosome analysis is vital for diagnosing genetic disorders and guiding cancer therapy decisions through the identification of somatic clonal aberrations. However, developing an AI model are hindered by the overwhelming complexity and diversity of chromosomal abnormalities, requiring extensive annotation efforts, while automated methods remain task-specific and lack generalizability due to the scarcity of comprehensive datasets spanning diverse resource conditions. Here, we introduce CHROMA, a foundation model for cytogenomics, designed to overcome these challenges by learning generalizable representations of chromosomal abnormalities. Pre-trained on over 84,000 specimens (~4 million chromosomal images) via self-supervised learning, CHROMA outperforms other methods across all types of abnormalities, even when trained on fewer labelled data and more imbalanced datasets. By facilitating comprehensive mapping of instability and clonal leisons across various aberration types, CHROMA offers a scalable and generalizable solution for reliable and automated clinical analysis, reducing the annotation workload for experts and advancing precision oncology through the early detection of rare genomic abnormalities, enabling broad clinical AI applications and making advanced genomic analysis more accessible.
CVMay 29, 2021
Compressed Sensing for Photoacoustic Computed Tomography Using an Untrained Neural NetworkHengrong Lan, Juze Zhang, Changchun Yang et al.
Photoacoustic (PA) computed tomography (PACT) shows great potentials in various preclinical and clinical applications. A great number of measurements are the premise that obtains a high-quality image, which implies a low imaging rate or a high system cost. The artifacts or sidelobes could pollute the image if we decrease the number of measured channels or limit the detected view. In this paper, a novel compressed sensing method for PACT using an untrained neural network is proposed, which decreases half number of the measured channels and recoveries enough details. This method uses a neural network to reconstruct without the requirement for any additional learning based on the deep image prior. The model can reconstruct the image only using a few detections with gradient descent. Our method can cooperate with other existing regularization, and further improve the quality. In addition, we introduce a shape prior to easily converge the model to the image. We verify the feasibility of untrained network based compressed sensing in PA image reconstruction, and compare this method with a conventional method using total variation minimization. The experimental results show that our proposed method outperforms 32.72% (SSIM) with the traditional compressed sensing method in the same regularization. It could dramatically reduce the requirement for the number of transducers, by sparsely sampling the raw PA data, and improve the quality of PA image significantly.
CVJan 22, 2021
AS-Net: Fast Photoacoustic Reconstruction with Multi-feature Fusion from Sparse DataMengjie Guo, Hengrong Lan, Changchun Yang et al.
Photoacoustic (PA) imaging is a biomedical imaging modality capable of acquiring high-contrast images of optical absorption at depths much greater than traditional optical imaging techniques. However, practical instrumentation and geometry limit the number of available acoustic sensors surrounding the imaging target, which results in the sparsity of sensor data. Conventional PA image reconstruction methods give severe artifacts when they are applied directly to the sparse PA data. In this paper, we firstly propose to employ a novel signal processing method to make sparse PA raw data more suitable for the neural network, concurrently speeding up image reconstruction. Then we propose Attention Steered Network (AS-Net) for PA reconstruction with multi-feature fusion. AS-Net is validated on different datasets, including simulated photoacoustic data from fundus vasculature phantoms and experimental data from in vivo fish and mice. Notably, the method is also able to eliminate some artifacts present in the ground truth for in vivo data. Results demonstrated that our method provides superior reconstructions at a faster speed.
CVDec 4, 2020
A Jointed Feature Fusion Framework for Photoacoustic ReconstructionHengrong Lan, Changchun Yang, Fei Gao
Photoacoustic (PA) computed tomography (PACT) reconstructs the initial pressure distribution from raw PA signals. The standard reconstruction of medical image could cause the artifacts due to interferences or ill-posed setup. Recently, deep learning has been used to reconstruct the PA image with ill-posed conditions. Most works remove the artifacts from image domain, and compensate the limited-view from dataset. In this paper, we propose a jointed feature fusion framework (JEFF-Net) based on deep learning to reconstruct the PA image using limited-view data. The cross-domain features from limited-view position-wise data and the reconstructed image are fused by a backtracked supervision. Specifically, our results could generate superior performance, whose artifacts are drastically reduced in the output compared to ground-truth (full-view reconstructed result). In this paper, a quarter position-wise data (32 channels) is fed into model, which outputs another 3-quarters-view data (96 channels). Moreover, two novel losses are designed to restrain the artifacts by sufficiently manipulating superposed data. The numerical and in-vivo results have demonstrated the superior performance of our method to reconstruct the full-view image without artifacts. Finally, quantitative evaluations show that our proposed method outperformed the ground-truth in some metrics.
IVAug 2, 2019
Y-Net: A Hybrid Deep Learning Reconstruction Framework for Photoacoustic Imaging in vivoHengrong Lan, Daohuai Jiang, Changchun Yang et al.
Photoacoustic imaging (PAI) is an emerging non-invasive imaging modality combining the advantages of deep ultrasound penetration and high optical contrast. Image reconstruction is an essential topic in PAI, which is unfortunately an ill-posed problem due to the complex and unknown optical/acoustic parameters in tissue. Conventional algorithms used in PAI (e.g., delay-and-sum) provide a fast solution while many artifacts remain, especially for linear array probe with limited-view issue. Convolutional neural network (CNN) has shown state-of-the-art results in computer vision, and more and more work based on CNN has been studied in medical image processing recently. In this paper, we present a non-iterative scheme filling the gap between existing direct-processing and post-processing methods, and propose a new framework Y-Net: a CNN architecture to reconstruct the PA image by optimizing both raw data and beamformed images once. The network connected two encoders with one decoder path, which optimally utilizes more information from raw data and beamformed image. The results of the test set showed good performance compared with conventional reconstruction algorithms and other deep learning methods. Our method is also validated with experiments both in-vitro and in vivo, which still performs better than other existing methods. The proposed Y-Net architecture also has high potential in medical image reconstruction for other imaging modalities beyond PAI.