CVJun 3, 2022
Metrics reloaded: Recommendations for image analysis validationLena Maier-Hein, Annika Reinke, Patrick Godau et al. · utoronto
Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. Particularly in automatic biomedical image analysis, chosen performance metrics often do not reflect the domain interest, thus failing to adequately measure scientific progress and hindering translation of ML techniques into practice. To overcome this, our large international expert consortium created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. The framework was developed in a multi-stage Delphi process and is based on the novel concept of a problem fingerprint - a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), data set and algorithm output. Based on the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as a classification task at image, object or pixel level, namely image-level classification, object detection, semantic segmentation, and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool, which also provides a point of access to explore weaknesses, strengths and specific recommendations for the most common validation metrics. The broad applicability of our framework across domains is demonstrated by an instantiation for various biological and medical image analysis use cases.
CVSep 29, 2022Code
3D UX-Net: A Large Kernel Volumetric ConvNet Modernizing Hierarchical Transformer for Medical Image SegmentationHo Hin Lee, Shunxing Bao, Yuankai Huo et al.
The recent 3D medical ViTs (e.g., SwinUNETR) achieve the state-of-the-art performances on several 3D volumetric data benchmarks, including 3D medical image segmentation. Hierarchical transformers (e.g., Swin Transformers) reintroduced several ConvNet priors and further enhanced the practical viability of adapting volumetric segmentation in 3D medical datasets. The effectiveness of hybrid approaches is largely credited to the large receptive field for non-local self-attention and the large number of model parameters. In this work, we propose a lightweight volumetric ConvNet, termed 3D UX-Net, which adapts the hierarchical transformer using ConvNet modules for robust volumetric segmentation. Specifically, we revisit volumetric depth-wise convolutions with large kernel size (e.g. starting from $7\times7\times7$) to enable the larger global receptive fields, inspired by Swin Transformer. We further substitute the multi-layer perceptron (MLP) in Swin Transformer blocks with pointwise depth convolutions and enhance model performances with fewer normalization and activation layers, thus reducing the number of model parameters. 3D UX-Net competes favorably with current SOTA transformers (e.g. SwinUNETR) using three challenging public datasets on volumetric brain and abdominal imaging: 1) MICCAI Challenge 2021 FLARE, 2) MICCAI Challenge 2021 FeTA, and 3) MICCAI Challenge 2022 AMOS. 3D UX-Net consistently outperforms SwinUNETR with improvement from 0.929 to 0.938 Dice (FLARE2021) and 0.867 to 0.874 Dice (Feta2021). We further evaluate the transfer learning capability of 3D UX-Net with AMOS2022 and demonstrates another improvement of $2.27\%$ Dice (from 0.880 to 0.900). The source code with our proposed model are available at https://github.com/MASILab/3DUX-Net.
IVSep 28, 2022Code
Reducing Positional Variance in Cross-sectional Abdominal CT Slices with Deep Conditional Generative ModelsXin Yu, Qi Yang, Yucheng Tang et al.
2D low-dose single-slice abdominal computed tomography (CT) slice enables direct measurements of body composition, which are critical to quantitatively characterizing health relationships on aging. However, longitudinal analysis of body composition changes using 2D abdominal slices is challenging due to positional variance between longitudinal slices acquired in different years. To reduce the positional variance, we extend the conditional generative models to our C-SliceGen that takes an arbitrary axial slice in the abdominal region as the condition and generates a defined vertebral level slice by estimating the structural changes in the latent space. Experiments on 1170 subjects from an in-house dataset and 50 subjects from BTCV MICCAI Challenge 2015 show that our model can generate high quality images in terms of realism and similarity. External experiments on 20 subjects from the Baltimore Longitudinal Study of Aging (BLSA) dataset that contains longitudinal single abdominal slices validate that our method can harmonize the slice positional variance in terms of muscle and visceral fat area. Our approach provides a promising direction of mapping slices from different vertebral levels to a target slice to reduce positional variance for single slice longitudinal analysis. The source code is available at: https://github.com/MASILab/C-SliceGen.
CVAug 15, 2022Code
Cross-scale Attention Guided Multi-instance Learning for Crohn's Disease Diagnosis with Pathological ImagesRuining Deng, Can Cui, Lucas W. Remedios et al.
Multi-instance learning (MIL) is widely used in the computer-aided interpretation of pathological Whole Slide Images (WSIs) to solve the lack of pixel-wise or patch-wise annotations. Often, this approach directly applies "natural image driven" MIL algorithms which overlook the multi-scale (i.e. pyramidal) nature of WSIs. Off-the-shelf MIL algorithms are typically deployed on a single-scale of WSIs (e.g., 20x magnification), while human pathologists usually aggregate the global and local patterns in a multi-scale manner (e.g., by zooming in and out between different magnifications). In this study, we propose a novel cross-scale attention mechanism to explicitly aggregate inter-scale interactions into a single MIL network for Crohn's Disease (CD), which is a form of inflammatory bowel disease. The contribution of this paper is two-fold: (1) a cross-scale attention mechanism is proposed to aggregate features from different resolutions with multi-scale interaction; and (2) differential multi-scale attention visualizations are generated to localize explainable lesion patterns. By training ~250,000 H&E-stained Ascending Colon (AC) patches from 20 CD patient and 30 healthy control samples at different scales, our approach achieved a superior Area under the Curve (AUC) score of 0.8924 compared with baseline models. The official implementation is publicly available at https://github.com/hrlblab/CS-MIL.
IVSep 4, 2022Code
Time-distance vision transformers in lung cancer diagnosis from longitudinal computed tomographyThomas Z. Li, Kaiwen Xu, Riqiang Gao et al.
Features learned from single radiologic images are unable to provide information about whether and how much a lesion may be changing over time. Time-dependent features computed from repeated images can capture those changes and help identify malignant lesions by their temporal behavior. However, longitudinal medical imaging presents the unique challenge of sparse, irregular time intervals in data acquisition. While self-attention has been shown to be a versatile and efficient learning mechanism for time series and natural images, its potential for interpreting temporal distance between sparse, irregularly sampled spatial features has not been explored. In this work, we propose two interpretations of a time-distance vision transformer (ViT) by using (1) vector embeddings of continuous time and (2) a temporal emphasis model to scale self-attention weights. The two algorithms are evaluated based on benign versus malignant lung cancer discrimination of synthetic pulmonary nodules and lung screening computed tomography studies from the National Lung Screening Trial (NLST). Experiments evaluating the time-distance ViTs on synthetic nodules show a fundamental improvement in classifying irregularly sampled longitudinal images when compared to standard ViTs. In cross-validation on screening chest CTs from the NLST, our methods (0.785 and 0.786 AUC respectively) significantly outperform a cross-sectional approach (0.734 AUC) and match the discriminative performance of the leading longitudinal medical imaging algorithm (0.779 AUC) on benign versus malignant classification. This work represents the first self-attention-based framework for classifying longitudinal medical images. Our code is available at https://github.com/tom1193/time-distance-transformer.
IVNov 30, 2022Code
Single Slice Thigh CT Muscle Group Segmentation with Domain Adaptation and Self-TrainingQi Yang, Xin Yu, Ho Hin Lee et al.
Objective: Thigh muscle group segmentation is important for assessment of muscle anatomy, metabolic disease and aging. Many efforts have been put into quantifying muscle tissues with magnetic resonance (MR) imaging including manual annotation of individual muscles. However, leveraging publicly available annotations in MR images to achieve muscle group segmentation on single slice computed tomography (CT) thigh images is challenging. Method: We propose an unsupervised domain adaptation pipeline with self-training to transfer labels from 3D MR to single CT slice. First, we transform the image appearance from MR to CT with CycleGAN and feed the synthesized CT images to a segmenter simultaneously. Single CT slices are divided into hard and easy cohorts based on the entropy of pseudo labels inferenced by the segmenter. After refining easy cohort pseudo labels based on anatomical assumption, self-training with easy and hard splits is applied to fine tune the segmenter. Results: On 152 withheld single CT thigh images, the proposed pipeline achieved a mean Dice of 0.888(0.041) across all muscle groups including sartorius, hamstrings, quadriceps femoris and gracilis. muscles Conclusion: To our best knowledge, this is the first pipeline to achieve thigh imaging domain adaptation from MR to CT. The proposed pipeline is effective and robust in extracting muscle groups on 2D single slice CT thigh images.The container is available for public use at https://github.com/MASILab/DA_CT_muscle_seg
IVJul 13, 2022Code
Body Composition Assessment with Limited Field-of-view Computed Tomography: A Semantic Image Extension PerspectiveKaiwen Xu, Thomas Li, Mirza S. Khan et al.
Field-of-view (FOV) tissue truncation beyond the lungs is common in routine lung screening computed tomography (CT). This poses limitations for opportunistic CT- based body composition (BC) assessment as key anatomical structures are missing. Traditionally, extending the FOV of CT is considered as a CT reconstruction problem using limited data. However, this approach relies on the projection domain data which might not be available in application. In this work, we formulate the problem from the semantic image extension perspective which only requires image data as inputs. The proposed two-stage method identifies a new FOV border based on the estimated extent of the complete body and imputes missing tissues in the truncated region. The training samples are simulated using CT slices with complete body in FOV, making the model development self-supervised. We evaluate the validity of the proposed method in automatic BC assessment using lung screening CT with limited FOV. The proposed method effectively restores the missing tissues and reduces BC assessment error introduced by FOV tissue truncation. In the BC assessment for a large-scale lung screening CT dataset, this correction improves both the intra-subject consistency and the correlation with anthropometric approximations. The developed method is available at https://github.com/MASILab/S-EFOV.
IVJan 2, 2023
CLIP-Driven Universal Model for Organ Segmentation and Tumor DetectionJie Liu, Yixiao Zhang, Jie-Neng Chen et al.
An increasing number of public datasets have shown a marked impact on automated organ segmentation and tumor detection. However, due to the small size and partially labeled problem of each dataset, as well as a limited investigation of diverse types of tumors, the resulting models are often limited to segmenting specific organs/tumors and ignore the semantics of anatomical structures, nor can they be extended to novel domains. To address these issues, we propose the CLIP-Driven Universal Model, which incorporates text embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models. This CLIP-based label encoding captures anatomical relationships, enabling the model to learn a structured feature embedding and segment 25 organs and 6 types of tumors. The proposed model is developed from an assembly of 14 datasets, using a total of 3,410 CT scans for training and then evaluated on 6,162 external CT scans from 3 additional datasets. We rank first on the Medical Segmentation Decathlon (MSD) public leaderboard and achieve state-of-the-art results on Beyond The Cranial Vault (BTCV). Additionally, the Universal Model is computationally more efficient (6x faster) compared with dataset-specific models, generalized better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks.
CVAug 20, 2023Code
Cell Spatial Analysis in Crohn's Disease: Unveiling Local Cell Arrangement Pattern with Graph-based SignaturesShunxing Bao, Sichen Zhu, Vasantha L Kolachala et al.
Crohn's disease (CD) is a chronic and relapsing inflammatory condition that affects segments of the gastrointestinal tract. CD activity is determined by histological findings, particularly the density of neutrophils observed on Hematoxylin and Eosin stains (H&E) imaging. However, understanding the broader morphometry and local cell arrangement beyond cell counting and tissue morphology remains challenging. To address this, we characterize six distinct cell types from H&E images and develop a novel approach for the local spatial signature of each cell. Specifically, we create a 10-cell neighborhood matrix, representing neighboring cell arrangements for each individual cell. Utilizing t-SNE for non-linear spatial projection in scatter-plot and Kernel Density Estimation contour-plot formats, our study examines patterns of differences in the cellular environment associated with the odds ratio of spatial patterns between active CD and control groups. This analysis is based on data collected at the two research institutes. The findings reveal heterogeneous nearest-neighbor patterns, signifying distinct tendencies of cell clustering, with a particular focus on the rectum region. These variations underscore the impact of data heterogeneity on cell spatial arrangements in CD patients. Moreover, the spatial distribution disparities between the two research sites highlight the significance of collaborative efforts among healthcare organizations. All research analysis pipeline tools are available at https://github.com/MASILab/cellNN.
IVApr 9, 2023
Segment Anything Model (SAM) for Digital Pathology: Assess Zero-shot Segmentation on Whole Slide ImagingRuining Deng, Can Cui, Quan Liu et al.
The segment anything model (SAM) was released as a foundation model for image segmentation. The promptable segmentation model was trained by over 1 billion masks on 11M licensed and privacy-respecting images. The model supports zero-shot image segmentation with various segmentation prompts (e.g., points, boxes, masks). It makes the SAM attractive for medical image analysis, especially for digital pathology where the training data are rare. In this study, we evaluate the zero-shot segmentation performance of SAM model on representative segmentation tasks on whole slide imaging (WSI), including (1) tumor segmentation, (2) non-tumor tissue segmentation, (3) cell nuclei segmentation. Core Results: The results suggest that the zero-shot SAM model achieves remarkable segmentation performance for large connected objects. However, it does not consistently achieve satisfying performance for dense instance object segmentation, even with 20 prompts (clicks/boxes) on each image. We also summarized the identified limitations for digital pathology: (1) image resolution, (2) multiple scales, (3) prompt selection, and (4) model fine-tuning. In the future, the few-shot fine-tuning with images from downstream pathological segmentation tasks might help the model to achieve better performance in dense object segmentation.
IVApr 6, 2023Code
Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule ClassificationThomas Z. Li, John M. Still, Kaiwen Xu et al.
The accuracy of predictive models for solitary pulmonary nodule (SPN) diagnosis can be greatly increased by incorporating repeat imaging and medical context, such as electronic health records (EHRs). However, clinically routine modalities such as imaging and diagnostic codes can be asynchronous and irregularly sampled over different time scales which are obstacles to longitudinal multimodal learning. In this work, we propose a transformer-based multimodal strategy to integrate repeat imaging with longitudinal clinical signatures from routinely collected EHRs for SPN classification. We perform unsupervised disentanglement of latent clinical signatures and leverage time-distance scaled self-attention to jointly learn from clinical signatures expressions and chest computed tomography (CT) scans. Our classifier is pretrained on 2,668 scans from a public dataset and 1,149 subjects with longitudinal chest CTs, billing codes, medications, and laboratory tests from EHRs of our home institution. Evaluation on 227 subjects with challenging SPNs revealed a significant AUC improvement over a longitudinal multimodal baseline (0.824 vs 0.752 AUC), as well as improvements over a single cross-section multimodal scenario (0.809 AUC) and a longitudinal imaging-only scenario (0.741 AUC). This work demonstrates significant advantages with a novel approach for co-learning longitudinal imaging and non-imaging phenotypes with transformers. Code available at https://github.com/MASILab/lmsignatures.
CVAug 30, 2022Code
Compound Figure Separation of Biomedical Images: Mining Large Datasets for Self-supervised LearningTianyuan Yao, Chang Qu, Jun Long et al.
With the rapid development of self-supervised learning (e.g., contrastive learning), the importance of having large-scale images (even without annotations) for training a more generalizable AI model has been widely recognized in medical image analysis. However, collecting large-scale task-specific unannotated data at scale can be challenging for individual labs. Existing online resources, such as digital books, publications, and search engines, provide a new resource for obtaining large-scale images. However, published images in healthcare (e.g., radiology and pathology) consist of a considerable amount of compound figures with subplots. In order to extract and separate compound figures into usable individual images for downstream learning, we propose a simple compound figure separation (SimCFS) framework without using the traditionally required detection bounding box annotations, with a new loss function and a hard case simulation. Our technical contribution is four-fold: (1) we introduce a simulation-based training framework that minimizes the need for resource extensive bounding box annotations; (2) we propose a new side loss that is optimized for compound figure separation; (3) we propose an intra-class image augmentation method to simulate hard cases; and (4) to the best of our knowledge, this is the first study that evaluates the efficacy of leveraging self-supervised learning with compound image separation. From the results, the proposed SimCFS achieved state-of-the-art performance on the ImageCLEF 2016 Compound Figure Separation Database. The pretrained self-supervised learning model using large-scale mined figures improved the accuracy of downstream image classification tasks with a contrastive learning algorithm. The source code of SimCFS is made publicly available at https://github.com/hrlblab/ImageSeperation.
IVApr 1, 2023Code
Cross-scale Multi-instance Learning for Pathological Image DiagnosisRuining Deng, Can Cui, Lucas W. Remedios et al.
Analyzing high resolution whole slide images (WSIs) with regard to information across multiple scales poses a significant challenge in digital pathology. Multi-instance learning (MIL) is a common solution for working with high resolution images by classifying bags of objects (i.e. sets of smaller image patches). However, such processing is typically performed at a single scale (e.g., 20x magnification) of WSIs, disregarding the vital inter-scale information that is key to diagnoses by human pathologists. In this study, we propose a novel cross-scale MIL algorithm to explicitly aggregate inter-scale relationships into a single MIL network for pathological image diagnosis. The contribution of this paper is three-fold: (1) A novel cross-scale MIL (CS-MIL) algorithm that integrates the multi-scale information and the inter-scale relationships is proposed; (2) A toy dataset with scale-specific morphological features is created and released to examine and visualize differential cross-scale attention; (3) Superior performance on both in-house and public datasets is demonstrated by our simple cross-scale MIL strategy. The official implementation is publicly available at https://github.com/hrlblab/CS-MIL.
IVSep 8, 2023Code
Enhancing Hierarchical Transformers for Whole Brain Segmentation with Intracranial Measurements IntegrationXin Yu, Yucheng Tang, Qi Yang et al.
Whole brain segmentation with magnetic resonance imaging (MRI) enables the non-invasive measurement of brain regions, including total intracranial volume (TICV) and posterior fossa volume (PFV). Enhancing the existing whole brain segmentation methodology to incorporate intracranial measurements offers a heightened level of comprehensiveness in the analysis of brain structures. Despite its potential, the task of generalizing deep learning techniques for intracranial measurements faces data availability constraints due to limited manually annotated atlases encompassing whole brain and TICV/PFV labels. In this paper, we enhancing the hierarchical transformer UNesT for whole brain segmentation to achieve segmenting whole brain with 133 classes and TICV/PFV simultaneously. To address the problem of data scarcity, the model is first pretrained on 4859 T1-weighted (T1w) 3D volumes sourced from 8 different sites. These volumes are processed through a multi-atlas segmentation pipeline for label generation, while TICV/PFV labels are unavailable. Subsequently, the model is finetuned with 45 T1w 3D volumes from Open Access Series Imaging Studies (OASIS) where both 133 whole brain classes and TICV/PFV labels are available. We evaluate our method with Dice similarity coefficients(DSC). We show that our model is able to conduct precise TICV/PFV estimation while maintaining the 132 brain regions performance at a comparable level. Code and trained model are available at: https://github.com/MASILab/UNesT/tree/main/wholebrainSeg.
IVSep 17, 2023Code
Deep conditional generative models for longitudinal single-slice abdominal computed tomography harmonizationXin Yu, Qi Yang, Yucheng Tang et al.
Two-dimensional single-slice abdominal computed tomography (CT) provides a detailed tissue map with high resolution allowing quantitative characterization of relationships between health conditions and aging. However, longitudinal analysis of body composition changes using these scans is difficult due to positional variation between slices acquired in different years, which leading to different organs/tissues captured. To address this issue, we propose C-SliceGen, which takes an arbitrary axial slice in the abdominal region as a condition and generates a pre-defined vertebral level slice by estimating structural changes in the latent space. Our experiments on 2608 volumetric CT data from two in-house datasets and 50 subjects from the 2015 Multi-Atlas Abdomen Labeling Challenge dataset (BTCV) Challenge demonstrate that our model can generate high-quality images that are realistic and similar. We further evaluate our method's capability to harmonize longitudinal positional variation on 1033 subjects from the Baltimore Longitudinal Study of Aging (BLSA) dataset, which contains longitudinal single abdominal slices, and confirmed that our method can harmonize the slice positional variance in terms of visceral fat area. This approach provides a promising direction for mapping slices from different vertebral levels to a target slice and reducing positional variance for single-slice longitudinal analysis. The source code is available at: https://github.com/MASILab/C-SliceGen.
CVFeb 3, 2023
Understanding metric-related pitfalls in image analysis validationAnnika Reinke, Minu D. Tizabi, Michael Baumgartner et al.
Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.
LGMar 25, 2022
Deep Multi-modal Fusion of Image and Non-image Data in Disease Diagnosis and Prognosis: A ReviewCan Cui, Haichun Yang, Yaohong Wang et al.
The rapid development of diagnostic technologies in healthcare is leading to higher requirements for physicians to handle and integrate the heterogeneous, yet complementary data that are produced during routine practice. For instance, the personalized diagnosis and treatment planning for a single cancer patient relies on the various images (e.g., radiological, pathological, and camera images) and non-image data (e.g., clinical data and genomic data). However, such decision-making procedures can be subjective, qualitative, and have large inter-subject variabilities. With the recent advances in multi-modal deep learning technologies, an increasingly large number of efforts have been devoted to a key question: how do we extract and aggregate multi-modal information to ultimately provide more objective, quantitative computer-aided clinical decision making? This paper reviews the recent studies on dealing with such a question. Briefly, this review will include the (1) overview of current multi-modal learning workflows, (2) summarization of multi-modal fusion methods, (3) discussion of the performance, (4) applications in disease diagnosis and prognosis, and (5) challenges and future directions.
IVMay 12, 2022Code
Pseudo-Label Guided Multi-Contrast Generalization for Non-Contrast Organ-Aware SegmentationHo Hin Lee, Yucheng Tang, Riqiang Gao et al.
Non-contrast computed tomography (NCCT) is commonly acquired for lung cancer screening, assessment of general abdominal pain or suspected renal stones, trauma evaluation, and many other indications. However, the absence of contrast limits distinguishing organ in-between boundaries. In this paper, we propose a novel unsupervised approach that leverages pairwise contrast-enhanced CT (CECT) context to compute non-contrast segmentation without ground-truth label. Unlike generative adversarial approaches, we compute the pairwise morphological context with CECT to provide teacher guidance instead of generating fake anatomical context. Additionally, we further augment the intensity correlations in 'organ-specific' settings and increase the sensitivity to organ-aware boundary. We validate our approach on multi-organ segmentation with paired non-contrast & contrast-enhanced CT scans using five-fold cross-validation. Full external validations are performed on an independent non-contrast cohort for aorta segmentation. Compared with current abdominal organs segmentation state-of-the-art in fully supervised setting, our proposed pipeline achieves a significantly higher Dice by 3.98% (internal multi-organ annotated), and 8.00% (external aorta annotated) for abdominal organs segmentation. The code and pretrained models are publicly available at https://github.com/MASILab/ContrastMix.
CVOct 16, 2022Code
Adaptive Contrastive Learning with Dynamic Correlation for Multi-Phase Organ SegmentationHo Hin Lee, Yucheng Tang, Han Liu et al.
Recent studies have demonstrated the superior performance of introducing ``scan-wise" contrast labels into contrastive learning for multi-organ segmentation on multi-phase computed tomography (CT). However, such scan-wise labels are limited: (1) a coarse classification, which could not capture the fine-grained ``organ-wise" contrast variations across all organs; (2) the label (i.e., contrast phase) is typically manually provided, which is error-prone and may introduce manual biases of defining phases. In this paper, we propose a novel data-driven contrastive loss function that adapts the similar/dissimilar contrast relationship between samples in each minibatch at organ-level. Specifically, as variable levels of contrast exist between organs, we hypothesis that the contrast differences in the organ-level can bring additional context for defining representations in the latent space. An organ-wise contrast correlation matrix is computed with mean organ intensities under one-hot attention maps. The goal of adapting the organ-driven correlation matrix is to model variable levels of feature separability at different phases. We evaluate our proposed approach on multi-organ segmentation with both non-contrast CT (NCCT) datasets and the MICCAI 2015 BTCV Challenge contrast-enhance CT (CECT) datasets. Compared to the state-of-the-art approaches, our proposed contrastive loss yields a substantial and significant improvement of 1.41% (from 0.923 to 0.936, p-value$<$0.01) and 2.02% (from 0.891 to 0.910, p-value$<$0.01) on mean Dice scores across all organs with respect to NCCT and CECT cohorts. We further assess the trained model performance with the MICCAI 2021 FLARE Challenge CECT datasets and achieve a substantial improvement of mean Dice score from 0.927 to 0.934 (p-value$<$0.01). The code is available at: https://github.com/MASILab/DCC_CL
CVSep 28, 2022
Longitudinal Variability Analysis on Low-dose Abdominal CT with Deep Learning-based SegmentationXin Yu, Yucheng Tang, Qi Yang et al.
Metabolic health is increasingly implicated as a risk factor across conditions from cardiology to neurology, and efficiency assessment of body composition is critical to quantitatively characterizing these relationships. 2D low dose single slice computed tomography (CT) provides a high resolution, quantitative tissue map, albeit with a limited field of view. Although numerous potential analyses have been proposed in quantifying image context, there has been no comprehensive study for low-dose single slice CT longitudinal variability with automated segmentation. We studied a total of 1816 slices from 1469 subjects of Baltimore Longitudinal Study on Aging (BLSA) abdominal dataset using supervised deep learning-based segmentation and unsupervised clustering method. 300 out of 1469 subjects that have two year gap in their first two scans were pick out to evaluate longitudinal variability with measurements including intraclass correlation coefficient (ICC) and coefficient of variation (CV) in terms of tissues/organs size and mean intensity. We showed that our segmentation methods are stable in longitudinal settings with Dice ranged from 0.821 to 0.962 for thirteen target abdominal tissues structures. We observed high variability in most organ with ICC<0.5, low variability in the area of muscle, abdominal wall, fat and body mask with average ICC>0.8. We found that the variability in organ is highly related to the cross-sectional position of the 2D slice. Our efforts pave quantitative exploration and quality control to reduce uncertainties in longitudinal analysis.
LGMar 8, 2022
Survival Prediction of Brain Cancer with Incomplete Radiology, Pathology, Genomics, and Demographic DataCan Cui, Han Liu, Quan Liu et al.
Integrating cross-department multi-modal data (e.g., radiological, pathological, genomic, and clinical data) is ubiquitous in brain cancer diagnosis and survival prediction. To date, such an integration is typically conducted by human physicians (and panels of experts), which can be subjective and semi-quantitative. Recent advances in multi-modal deep learning, however, have opened a door to leverage such a process to a more objective and quantitative manner. Unfortunately, the prior arts of using four modalities on brain cancer survival prediction are limited by a "complete modalities" setting (i.e., with all modalities available). Thus, there are still open questions on how to effectively predict brain cancer survival from the incomplete radiological, pathological, genomic, and demographic data (e.g., one or more modalities might not be collected for a patient). For instance, should we use both complete and incomplete data, and more importantly, how to use those data? To answer the preceding questions, we generalize the multi-modal learning on cross-department multi-modal data to a missing data setting. Our contribution is three-fold: 1) We introduce optimal multi-modal learning with missing data (MMD) pipeline with optimized hardware consumption and computational efficiency; 2) We extend multi-modal learning on radiological, pathological, genomic, and demographic data into missing data scenarios; 3) a large-scale public dataset (with 962 patients) is collected to systematically evaluate glioma tumor survival prediction using four modalities. The proposed method improved the C-index of survival prediction from 0.7624 to 0.8053.
IVMar 10, 2023
Scaling Up 3D Kernels with Bayesian Frequency Re-parameterization for Medical Image SegmentationHo Hin Lee, Quan Liu, Shunxing Bao et al.
With the inspiration of vision transformers, the concept of depth-wise convolution revisits to provide a large Effective Receptive Field (ERF) using Large Kernel (LK) sizes for medical image segmentation. However, the segmentation performance might be saturated and even degraded as the kernel sizes scaled up (e.g., $21\times 21\times 21$) in a Convolutional Neural Network (CNN). We hypothesize that convolution with LK sizes is limited to maintain an optimal convergence for locality learning. While Structural Re-parameterization (SR) enhances the local convergence with small kernels in parallel, optimal small kernel branches may hinder the computational efficiency for training. In this work, we propose RepUX-Net, a pure CNN architecture with a simple large kernel block design, which competes favorably with current network state-of-the-art (SOTA) (e.g., 3D UX-Net, SwinUNETR) using 6 challenging public datasets. We derive an equivalency between kernel re-parameterization and the branch-wise variation in kernel convergence. Inspired by the spatial frequency in the human visual system, we extend to vary the kernel convergence into element-wise setting and model the spatial frequency as a Bayesian prior to re-parameterize convolutional weights during training. Specifically, a reciprocal function is leveraged to estimate a frequency-weighted value, which rescales the corresponding kernel element for stochastic gradient descent. From the experimental results, RepUX-Net consistently outperforms 3D SOTA benchmarks with internal validation (FLARE: 0.929 to 0.944), external validation (MSD: 0.901 to 0.932, KiTS: 0.815 to 0.847, LiTS: 0.933 to 0.949, TCIA: 0.736 to 0.779) and transfer learning (AMOS: 0.880 to 0.911) scenarios in Dice Score.
CVJun 5, 2023
Robust Fiber Orientation Distribution Function Estimation Using Deep Constrained Spherical Deconvolution for Diffusion MRITianyuan Yao, Francois Rheault, Leon Y Cai et al.
Diffusion-weighted magnetic resonance imaging (DW-MRI) is a critical imaging method for capturing and modeling tissue microarchitecture at a millimeter scale. A common practice to model the measured DW-MRI signal is via fiber orientation distribution function (fODF). This function is the essential first step for the downstream tractography and connectivity analyses. With recent advantages in data sharing, large-scale multi-site DW-MRI datasets are being made available for multi-site studies. However, measurement variabilities (e.g., inter- and intra-site variability, hardware performance, and sequence design) are inevitable during the acquisition of DW-MRI. Most existing model-based methods (e.g., constrained spherical deconvolution (CSD)) and learning based methods (e.g., deep learning (DL)) do not explicitly consider such variabilities in fODF modeling, which consequently leads to inferior performance on multi-site and/or longitudinal diffusion studies. In this paper, we propose a novel data-driven deep constrained spherical deconvolution method to explicitly constrain the scan-rescan variabilities for a more reproducible and robust estimation of brain microstructure from repeated DW-MRI scans. Specifically, the proposed method introduces a new 3D volumetric scanner-invariant regularization scheme during the fODF estimation. We study the Human Connectome Project (HCP) young adults test-retest group as well as the MASiVar dataset (with inter- and intra-site scan/rescan data). The Baltimore Longitudinal Study of Aging (BLSA) dataset is employed for external validation. From the experimental results, the proposed data-driven framework outperforms the existing benchmarks in repeated fODF estimation. The proposed method is assessing the downstream connectivity analysis and shows increased performance in distinguishing subjects with different biomarkers.
CVJun 2, 2022
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectivesJun Li, Junyu Chen, Yucheng Tang et al.
Transformer, the latest technological advance of deep learning, has gained prevalence in natural language processing or computer vision. Since medical imaging bear some resemblance to computer vision, it is natural to inquire about the status quo of Transformers in medical imaging and ask the question: can the Transformer models transform medical imaging? In this paper, we attempt to make a response to the inquiry. After a brief introduction of the fundamentals of Transformers, especially in comparison with convolutional neural networks (CNNs), and highlighting key defining properties that characterize the Transformers, we offer a comprehensive review of the state-of-the-art Transformer-based approaches for medical imaging and exhibit current research progresses made in the areas of medical image segmentation, recognition, detection, registration, reconstruction, enhancement, etc. In particular, what distinguishes our review lies in its organization based on the Transformer's key defining properties, which are mostly derived from comparing the Transformer and CNN, and its type of architecture, which specifies the manner in which the Transformer and CNN are combined, all helping the readers to best understand the rationale behind the reviewed approaches. We conclude with discussions of future perspectives.
IVMar 4, 2022
Characterizing Renal Structures with 3D Block Aggregate TransformersXin Yu, Yucheng Tang, Yinchi Zhou et al.
Efficiently quantifying renal structures can provide distinct spatial context and facilitate biomarker discovery for kidney morphology. However, the development and evaluation of the transformer model to segment the renal cortex, medulla, and collecting system remains challenging due to data inefficiency. Inspired by the hierarchical structures in vision transformer, we propose a novel method using a 3D block aggregation transformer for segmenting kidney components on contrast-enhanced CT scans. We construct the first cohort of renal substructures segmentation dataset with 116 subjects under institutional review board (IRB) approval. Our method yields the state-of-the-art performance (Dice of 0.8467) against the baseline approach of 0.8308 with the data-efficient design. The Pearson R achieves 0.9891 between the proposed method and manual standards and indicates the strong correlation and reproducibility for volumetric analysis. We extend the proposed method to the public KiTS dataset, the method leads to improved accuracy compared to transformer-based approaches. We show that the 3D block aggregation transformer can achieve local communication between sequence representations without modifying self-attention, and it can serve as an accurate and efficient quantification tool for characterizing renal structures.
CVJul 3, 2023
Feasibility of Universal Anomaly Detection without Knowing the Abnormality in Medical ImagesCan Cui, Yaohong Wang, Shunxing Bao et al.
Many anomaly detection approaches, especially deep learning methods, have been recently developed to identify abnormal image morphology by only employing normal images during training. Unfortunately, many prior anomaly detection methods were optimized for a specific "known" abnormality (e.g., brain tumor, bone fraction, cell types). Moreover, even though only the normal images were used in the training process, the abnormal images were often employed during the validation process (e.g., epoch selection, hyper-parameter tuning), which might leak the supposed ``unknown" abnormality unintentionally. In this study, we investigated these two essential aspects regarding universal anomaly detection in medical images by (1) comparing various anomaly detection methods across four medical datasets, (2) investigating the inevitable but often neglected issues on how to unbiasedly select the optimal anomaly detection model during the validation phase using only normal images, and (3) proposing a simple decision-level ensemble method to leverage the advantage of different kinds of anomaly detection without knowing the abnormality. The results of our experiments indicate that none of the evaluated methods consistently achieved the best performance across all datasets. Our proposed method enhanced the robustness of performance in general (average AUC 0.956).
IVNov 6, 2023
Predicting Age from White Matter Diffusivity with Residual LearningChenyu Gao, Michael E. Kim, Ho Hin Lee et al.
Imaging findings inconsistent with those expected at specific chronological age ranges may serve as early indicators of neurological disorders and increased mortality risk. Estimation of chronological age, and deviations from expected results, from structural MRI data has become an important task for developing biomarkers that are sensitive to such deviations. Complementary to structural analysis, diffusion tensor imaging (DTI) has proven effective in identifying age-related microstructural changes within the brain white matter, thereby presenting itself as a promising additional modality for brain age prediction. Although early studies have sought to harness DTI's advantages for age estimation, there is no evidence that the success of this prediction is owed to the unique microstructural and diffusivity features that DTI provides, rather than the macrostructural features that are also available in DTI data. Therefore, we seek to develop white-matter-specific age estimation to capture deviations from normal white matter aging. Specifically, we deliberately disregard the macrostructural information when predicting age from DTI scalar images, using two distinct methods. The first method relies on extracting only microstructural features from regions of interest. The second applies 3D residual neural networks (ResNets) to learn features directly from the images, which are non-linearly registered and warped to a template to minimize macrostructural variations. When tested on unseen data, the first method yields mean absolute error (MAE) of 6.11 years for cognitively normal participants and MAE of 6.62 years for cognitively impaired participants, while the second method achieves MAE of 4.69 years for cognitively normal participants and MAE of 4.96 years for cognitively impaired participants. We find that the ResNet model captures subtler, non-macrostructural features for brain age prediction.
LGMar 29, 2023
A Unified Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRITianyuan Yao, Nancy Newlin, Praitayini Kanakaraj et al.
Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture estimation, therefore, require a signal representation that extends over the radial as well as angular domain. Multiple approaches have been proposed that can model the non-linear relationship between the DW-MRI signal and biological microstructure. In the past few years, many deep learning-based methods have been developed towards faster inference speed and higher inter-scan consistency compared with traditional model-based methods (e.g., multi-shell multi-tissue constrained spherical deconvolution). However, a multi-stage learning strategy is typically required since the learning process relies on various middle representations, such as simple harmonic oscillator reconstruction (SHORE) representation. In this work, we present a unified dynamic network with a single-stage spherical convolutional neural network, which allows efficient fiber orientation distribution function (fODF) estimation through heterogeneous multi-shell diffusion MRI sequences. We study the Human Connectome Project (HCP) young adults with test-retest scans. From the experimental results, the proposed single-stage method outperforms prior multi-stage approaches in repeated fODF estimation with shell dropoff and single-shell DW-MRI sequences.
AINov 24, 2023
RAISE -- Radiology AI Safety, an End-to-end lifecycle approachM. Jorge Cardoso, Julia Moosbauer, Tessa S. Cook et al.
The integration of AI into radiology introduces opportunities for improved clinical care provision and efficiency but it demands a meticulous approach to mitigate potential risks as with any other new technology. Beginning with rigorous pre-deployment evaluation and validation, the focus should be on ensuring models meet the highest standards of safety, effectiveness and efficacy for their intended applications. Input and output guardrails implemented during production usage act as an additional layer of protection, identifying and addressing individual failures as they occur. Continuous post-deployment monitoring allows for tracking population-level performance (data drift), fairness, and value delivery over time. Scheduling reviews of post-deployment model performance and educating radiologists about new algorithmic-driven findings is critical for AI to be effective in clinical practice. Recognizing that no single AI solution can provide absolute assurance even when limited to its intended use, the synergistic application of quality assurance at multiple levels - regulatory, clinical, technical, and ethical - is emphasized. Collaborative efforts between stakeholders spanning healthcare systems, industry, academia, and government are imperative to address the multifaceted challenges involved. Trust in AI is an earned privilege, contingent on a broad set of goals, among them transparently demonstrating that the AI adheres to the same rigorous safety, effectiveness and efficacy standards as other established medical technologies. By doing so, developers can instil confidence among providers and patients alike, enabling the responsible scaling of AI and the realization of its potential benefits. The roadmap presented herein aims to expedite the achievement of deployable, reliable, and safe AI in radiology.
CVSep 30, 2023
DeformUX-Net: Exploring a 3D Foundation Backbone for Medical Image Segmentation with Depthwise Deformable ConvolutionHo Hin Lee, Quan Liu, Qi Yang et al.
The application of 3D ViTs to medical image segmentation has seen remarkable strides, somewhat overshadowing the budding advancements in Convolutional Neural Network (CNN)-based models. Large kernel depthwise convolution has emerged as a promising technique, showcasing capabilities akin to hierarchical transformers and facilitating an expansive effective receptive field (ERF) vital for dense predictions. Despite this, existing core operators, ranging from global-local attention to large kernel convolution, exhibit inherent trade-offs and limitations (e.g., global-local range trade-off, aggregating attentional features). We hypothesize that deformable convolution can be an exploratory alternative to combine all advantages from the previous operators, providing long-range dependency, adaptive spatial aggregation and computational efficiency as a foundation backbone. In this work, we introduce 3D DeformUX-Net, a pioneering volumetric CNN model that adeptly navigates the shortcomings traditionally associated with ViTs and large kernel convolution. Specifically, we revisit volumetric deformable convolution in depth-wise setting to adapt long-range dependency with computational efficiency. Inspired by the concepts of structural re-parameterization for convolution kernel weights, we further generate the deformable tri-planar offsets by adapting a parallel branch (starting from $1\times1\times1$ convolution), providing adaptive spatial aggregation across all channels. Our empirical evaluations reveal that the 3D DeformUX-Net consistently outperforms existing state-of-the-art ViTs and large kernel convolution models across four challenging public datasets, spanning various scales from organs (KiTS: 0.680 to 0.720, MSD Pancreas: 0.676 to 0.717, AMOS: 0.871 to 0.902) to vessels (e.g., MSD hepatic vessels: 0.635 to 0.671) in mean Dice.
IVSep 22, 2023
Inter-vendor harmonization of Computed Tomography (CT) reconstruction kernels using unpaired image translationAravind R. Krishnan, Kaiwen Xu, Thomas Li et al.
The reconstruction kernel in computed tomography (CT) generation determines the texture of the image. Consistency in reconstruction kernels is important as the underlying CT texture can impact measurements during quantitative image analysis. Harmonization (i.e., kernel conversion) minimizes differences in measurements due to inconsistent reconstruction kernels. Existing methods investigate harmonization of CT scans in single or multiple manufacturers. However, these methods require paired scans of hard and soft reconstruction kernels that are spatially and anatomically aligned. Additionally, a large number of models need to be trained across different kernel pairs within manufacturers. In this study, we adopt an unpaired image translation approach to investigate harmonization between and across reconstruction kernels from different manufacturers by constructing a multipath cycle generative adversarial network (GAN). We use hard and soft reconstruction kernels from the Siemens and GE vendors from the National Lung Screening Trial dataset. We use 50 scans from each reconstruction kernel and train a multipath cycle GAN. To evaluate the effect of harmonization on the reconstruction kernels, we harmonize 50 scans each from Siemens hard kernel, GE soft kernel and GE hard kernel to a reference Siemens soft kernel (B30f) and evaluate percent emphysema. We fit a linear model by considering the age, smoking status, sex and vendor and perform an analysis of variance (ANOVA) on the emphysema scores. Our approach minimizes differences in emphysema measurement and highlights the impact of age, sex, smoking status and vendor on emphysema quantification.
IVJun 2, 2023
Multi-Contrast Computed Tomography Atlas of Healthy PancreasYinchi Zhou, Ho Hin Lee, Yucheng Tang et al.
With the substantial diversity in population demographics, such as differences in age and body composition, the volumetric morphology of pancreas varies greatly, resulting in distinctive variations in shape and appearance. Such variations increase the difficulty at generalizing population-wide pancreas features. A volumetric spatial reference is needed to adapt the morphological variability for organ-specific analysis. Here, we proposed a high-resolution computed tomography (CT) atlas framework specifically optimized for the pancreas organ across multi-contrast CT. We introduce a deep learning-based pre-processing technique to extract the abdominal region of interests (ROIs) and leverage a hierarchical registration pipeline to align the pancreas anatomy across populations. Briefly, DEEDs affine and non-rigid registration are performed to transfer patient abdominal volumes to a fixed high-resolution atlas template. To generate and evaluate the pancreas atlas template, multi-contrast modality CT scans of 443 subjects (without reported history of pancreatic disease, age: 15-50 years old) are processed. Comparing with different registration state-of-the-art tools, the combination of DEEDs affine and non-rigid registration achieves the best performance for the pancreas label transfer across all contrast phases. We further perform external evaluation with another research cohort of 100 de-identified portal venous scans with 13 organs labeled, having the best label transfer performance of 0.504 Dice score in unsupervised setting. The qualitative representation (e.g., average mapping) of each phase creates a clear boundary of pancreas and its distinctive contrast appearance. The deformation surface renderings across scales (e.g., small to large volume) further illustrate the generalizability of the proposed atlas template.
CVSep 20, 2024
Multi-Modality Conditioned Variational U-Net for Field-of-View Extension in Brain Diffusion MRIZhiyuan Li, Chenyu Gao, Praitayini Kanakaraj et al.
An incomplete field-of-view (FOV) in diffusion magnetic resonance imaging (dMRI) can severely hinder the volumetric and bundle analyses of whole-brain white matter connectivity. Although existing works have investigated imputing the missing regions using deep generative models, it remains unclear how to specifically utilize additional information from paired multi-modality data and whether this can enhance the imputation quality and be useful for downstream tractography. To fill this gap, we propose a novel framework for imputing dMRI scans in the incomplete part of the FOV by integrating the learned diffusion features in the acquired part of the FOV to the complete brain anatomical structure. We hypothesize that by this design the proposed framework can enhance the imputation performance of the dMRI scans and therefore be useful for repairing whole-brain tractography in corrupted dMRI scans with incomplete FOV. We tested our framework on two cohorts from different sites with a total of 96 subjects and compared it with a baseline imputation method that treats the information from T1w and dMRI scans equally. The proposed framework achieved significant improvements in imputation performance, as demonstrated by angular correlation coefficient (p < 1E-5), and in downstream tractography accuracy, as demonstrated by Dice score (p < 0.01). Results suggest that the proposed framework improved imputation performance in dMRI scans by specifically utilizing additional information from paired multi-modality data, compared with the baseline method. The imputation achieved by the proposed framework enhances whole brain tractography, and therefore reduces the uncertainty when analyzing bundles associated with neurodegenerative.
CVDec 19, 2025
Medical Imaging AI Competitions Lack FairnessAnnika Reinke, Evangelia Christodoulou, Sthuthi Sadananda et al.
Benchmarking competitions are central to the development of artificial intelligence (AI) in medical imaging, defining performance standards and shaping methodological progress. However, it remains unclear whether these benchmarks provide data that are sufficiently representative, accessible, and reusable to support clinically meaningful AI. In this work, we assess fairness along two complementary dimensions: (1) whether challenge datasets are representative of real-world clinical diversity, and (2) whether they are accessible and legally reusable in line with the FAIR principles. To address this question, we conducted a large-scale systematic study of 241 biomedical image analysis challenges comprising 458 tasks across 19 imaging modalities. Our findings show substantial biases in dataset composition, including geographic location, modality-, and problem type-related biases, indicating that current benchmarks do not adequately reflect real-world clinical diversity. Despite their widespread influence, challenge datasets were frequently constrained by restrictive or ambiguous access conditions, inconsistent or non-compliant licensing practices, and incomplete documentation, limiting reproducibility and long-term reuse. Together, these shortcomings expose foundational fairness limitations in our benchmarking ecosystem and highlight a disconnect between leaderboard success and clinical relevance.
IVApr 7, 2023
Zero-shot CT Field-of-view Completion with Unconditional Generative Diffusion PriorKaiwen Xu, Aravind R. Krishnan, Thomas Z. Li et al.
Anatomically consistent field-of-view (FOV) completion to recover truncated body sections has important applications in quantitative analyses of computed tomography (CT) with limited FOV. Existing solution based on conditional generative models relies on the fidelity of synthetic truncation patterns at training phase, which poses limitations for the generalizability of the method to potential unknown types of truncation. In this study, we evaluate a zero-shot method based on a pretrained unconditional generative diffusion prior, where truncation pattern with arbitrary forms can be specified at inference phase. In evaluation on simulated chest CT slices with synthetic FOV truncation, the method is capable of recovering anatomically consistent body sections and subcutaneous adipose tissue measurement error caused by FOV truncation. However, the correction accuracy is inferior to the conditionally trained counterpart.
AIMar 31
An Artifact-based Agent Framework for Adaptive and Reproducible Medical Image ProcessingLianrui Zuo, Yihao Liu, Gaurav Rudravaram et al.
Medical imaging research is increasingly shifting from controlled benchmark evaluation toward real-world clinical deployment. In such settings, applying analytical methods extends beyond model design to require dataset-aware workflow configuration and provenance tracking. Two requirements therefore become central: \textbf{adaptability}, the ability to configure workflows according to dataset-specific conditions and evolving analytical goals; and \textbf{reproducibility}, the guarantee that all transformations and decisions are explicitly recorded and re-executable. Here, we present an artifact-based agent framework that introduces a semantic layer to augment medical image processing. The framework formalizes intermediate and final outputs through an artifact contract, enabling structured interrogation of workflow state and goal-conditioned assembly of configurations from a modular rule library. Execution is delegated to a workflow executor to preserve deterministic computational graph construction and provenance tracking, while the agent operates locally to comply with most privacy constraints. We evaluate the framework on real-world clinical CT and MRI cohorts, demonstrating adaptive configuration synthesis, deterministic reproducibility across repeated executions, and artifact-grounded semantic querying. These results show that adaptive workflow configuration can be achieved without compromising reproducibility in heterogeneous clinical environments.
CVSep 6, 2024
Influence of Early through Late Fusion on Pancreas Segmentation from Imperfectly Registered Multimodal MRILucas W. Remedios, Han Liu, Samuel W. Remedios et al.
Multimodal fusion promises better pancreas segmentation. However, where to perform fusion in models is still an open question. It is unclear if there is a best location to fuse information when analyzing pairs of imperfectly aligned images. Two main alignment challenges in this pancreas segmentation study are 1) the pancreas is deformable and 2) breathing deforms the abdomen. Even after image registration, relevant deformations are often not corrected. We examine how early through late fusion impacts pancreas segmentation. We used 353 pairs of T2-weighted (T2w) and T1-weighted (T1w) abdominal MR images from 163 subjects with accompanying pancreas labels. We used image registration (deeds) to align the image pairs. We trained a collection of basic UNets with different fusion points, spanning from early to late, to assess how early through late fusion influenced segmentation performance on imperfectly aligned images. We assessed generalization of fusion points on nnUNet. The single-modality T2w baseline using a basic UNet model had a Dice score of 0.73, while the same baseline on the nnUNet model achieved 0.80. For the basic UNet, the best fusion approach occurred in the middle of the encoder (early/mid fusion), which led to a statistically significant improvement of 0.0125 on Dice score compared to the baseline. For the nnUNet, the best fusion approach was naïve image concatenation before the model (early fusion), which resulted in a statistically significant Dice score increase of 0.0021 compared to baseline. Fusion in specific blocks can improve performance, but the best blocks for fusion are model specific, and the gains are small. In imperfectly registered datasets, fusion is a nuanced problem, with the art of design remaining vital for uncovering potential insights. Future innovation is needed to better address fusion in cases of imperfect alignment of abdominal image pairs.
CVApr 24, 2023
Exploring shared memory architectures for end-to-end gigapixel deep learningLucas W. Remedios, Leon Y. Cai, Samuel W. Remedios et al.
Deep learning has made great strides in medical imaging, enabled by hardware advances in GPUs. One major constraint for the development of new models has been the saturation of GPU memory resources during training. This is especially true in computational pathology, where images regularly contain more than 1 billion pixels. These pathological images are traditionally divided into small patches to enable deep learning due to hardware limitations. In this work, we explore whether the shared GPU/CPU memory architecture on the M1 Ultra systems-on-a-chip (SoCs) recently released by Apple, Inc. may provide a solution. These affordable systems (less than \$5000) provide access to 128 GB of unified memory (Mac Studio with M1 Ultra SoC). As a proof of concept for gigapixel deep learning, we identified tissue from background on gigapixel areas from whole slide images (WSIs). The model was a modified U-Net (4492 parameters) leveraging large kernels and high stride. The M1 Ultra SoC was able to train the model directly on gigapixel images (16000$\times$64000 pixels, 1.024 billion pixels) with a batch size of 1 using over 100 GB of unified memory for the process at an average speed of 1 minute and 21 seconds per batch with Tensorflow 2/Keras. As expected, the model converged with a high Dice score of 0.989 $\pm$ 0.005. Training up until this point took 111 hours and 24 minutes over 4940 steps. Other high RAM GPUs like the NVIDIA A100 (largest commercially accessible at 80 GB, $\sim$\$15000) are not yet widely available (in preview for select regions on Amazon Web Services at \$40.96/hour as a group of 8). This study is a promising step towards WSI-wise end-to-end deep learning with prevalent network architectures.
IVSep 28, 2022
UNesT: Local Spatial Representation Learning with Hierarchical Transformer for Efficient Medical SegmentationXin Yu, Qi Yang, Yinchi Zhou et al.
Transformer-based models, capable of learning better global dependencies, have recently demonstrated exceptional representation learning capabilities in computer vision and medical image analysis. Transformer reformats the image into separate patches and realizes global communication via the self-attention mechanism. However, positional information between patches is hard to preserve in such 1D sequences, and loss of it can lead to sub-optimal performance when dealing with large amounts of heterogeneous tissues of various sizes in 3D medical image segmentation. Additionally, current methods are not robust and efficient for heavy-duty medical segmentation tasks such as predicting a large number of tissue classes or modeling globally inter-connected tissue structures. To address such challenges and inspired by the nested hierarchical structures in vision transformer, we proposed a novel 3D medical image segmentation method (UNesT), employing a simplified and faster-converging transformer encoder design that achieves local communication among spatially adjacent patch sequences by aggregating them hierarchically. We extensively validate our method on multiple challenging datasets, consisting of multiple modalities, anatomies, and a wide range of tissue classes, including 133 structures in the brain, 14 organs in the abdomen, 4 hierarchical components in the kidneys, inter-connected kidney tumors and brain tumors. We show that UNesT consistently achieves state-of-the-art performance and evaluate its generalizability and data efficiency. Particularly, the model achieves whole brain segmentation task complete ROI with 133 tissue classes in a single network, outperforming prior state-of-the-art method SLANT27 ensembled with 27 networks.
CVDec 10, 2025
MetaVoxel: Joint Diffusion Modeling of Imaging and Clinical MetadataYihao Liu, Chenyu Gao, Lianrui Zuo et al.
Modern deep learning methods have achieved impressive results across tasks from disease classification, estimating continuous biomarkers, to generating realistic medical images. Most of these approaches are trained to model conditional distributions defined by a specific predictive direction with a specific set of input variables. We introduce MetaVoxel, a generative joint diffusion modeling framework that models the joint distribution over imaging data and clinical metadata by learning a single diffusion process spanning all variables. By capturing the joint distribution, MetaVoxel unifies tasks that traditionally require separate conditional models and supports flexible zero-shot inference using arbitrary subsets of inputs without task-specific retraining. Using more than 10,000 T1-weighted MRI scans paired with clinical metadata from nine datasets, we show that a single MetaVoxel model can perform image generation, age estimation, and sex prediction, achieving performance comparable to established task-specific baselines. Additional experiments highlight its capabilities for flexible inference.Together, these findings demonstrate that joint multimodal diffusion offers a promising direction for unifying medical AI models and enabling broader clinical applicability.
CVFeb 13
Insertion Network for Image Sequence CorrespondenceDingjie Su, Weixiang Hong, Benoit M. Dawant et al.
We propose a novel method for establishing correspondence between two sequences of 2D images. One particular application of this technique is slice-level content navigation, where the goal is to localize specific 2D slices within a 3D volume or determine the anatomical coverage of a 3D scan based on its 2D slices. This serves as an important preprocessing step for various diagnostic tasks, as well as for automatic registration and segmentation pipelines. Our approach builds sequence correspondence by training a network to learn how to insert a slice from one sequence into the appropriate position in another. This is achieved by encoding contextual representations of each slice and modeling the insertion process using a slice-to-slice attention mechanism. We apply this method to localize manually labeled key slices in body CT scans and compare its performance to the current state-of-the-art alternative known as body part regression, which predicts anatomical position scores for individual slices. Unlike body part regression, which treats each slice independently, our method leverages contextual information from the entire sequence. Experimental results show that the insertion network reduces slice localization errors in supervised settings from 8.4 mm to 5.4 mm, demonstrating a substantial improvement in accuracy.
CVJan 28, 2025Code
Post-Training Quantization for 3D Medical Image Segmentation: A Practical Study on Real Inference EnginesChongyu Qu, Ritchie Zhao, Ye Yu et al.
Quantizing deep neural networks ,reducing the precision (bit-width) of their computations, can remarkably decrease memory usage and accelerate processing, making these models more suitable for large-scale medical imaging applications with limited computational resources. However, many existing methods studied "fake quantization", which simulates lower precision operations during inference, but does not actually reduce model size or improve real-world inference speed. Moreover, the potential of deploying real 3D low-bit quantization on modern GPUs is still unexplored. In this study, we introduce a real post-training quantization (PTQ) framework that successfully implements true 8-bit quantization on state-of-the-art (SOTA) 3D medical segmentation models, i.e., U-Net, SegResNet, SwinUNETR, nnU-Net, UNesT, TransUNet, ST-UNet,and VISTA3D. Our approach involves two main steps. First, we use TensorRT to perform fake quantization for both weights and activations with unlabeled calibration dataset. Second, we convert this fake quantization into real quantization via TensorRT engine on real GPUs, resulting in real-world reductions in model size and inference latency. Extensive experiments demonstrate that our framework effectively performs 8-bit quantization on GPUs without sacrificing model performance. This advancement enables the deployment of efficient deep learning models in medical imaging applications where computational resources are constrained. The code and models have been released, including U-Net, TransUNet pretrained on the BTCV dataset for abdominal (13-label) segmentation, UNesT pretrained on the Whole Brain Dataset for whole brain (133-label) segmentation, and nnU-Net, SegResNet, SwinUNETR and VISTA3D pretrained on TotalSegmentator V2 for full body (104-label) segmentation. https://github.com/hrlblab/PTQ.
CVJan 30
AdaFuse: Adaptive Multimodal Fusion for Lung Cancer Risk Prediction via Reinforcement LearningChongyu Qu, Zhengyi Lu, Yuxiang Lai et al.
Multimodal fusion has emerged as a promising paradigm for disease diagnosis and prognosis, integrating complementary information from heterogeneous data sources such as medical images, clinical records, and radiology reports. However, existing fusion methods process all available modalities through the network, either treating them equally or learning to assign different contribution weights, leaving a fundamental question unaddressed: for a given patient, should certain modalities be used at all? We present AdaFuse, an adaptive multimodal fusion framework that leverages reinforcement learning (RL) to learn patient-specific modality selection and fusion strategies for lung cancer risk prediction. AdaFuse formulates multimodal fusion as a sequential decision process, where the policy network iteratively decides whether to incorporate an additional modality or proceed to prediction based on the information already acquired. This sequential formulation enables the model to condition each selection on previously observed modalities and terminate early when sufficient information is available, rather than committing to a fixed subset upfront. We evaluate AdaFuse on the National Lung Screening Trial (NLST) dataset. Experimental results demonstrate that AdaFuse achieves the highest AUC (0.762) compared to the best single-modality baseline (0.732), the best fixed fusion strategy (0.759), and adaptive baselines including DynMM (0.754) and MoE (0.742), while using fewer FLOPs than all triple-modality methods. Our work demonstrates the potential of reinforcement learning for personalized multimodal fusion in medical imaging, representing a shift from uniform fusion strategies toward adaptive diagnostic pipelines that learn when to consult additional modalities and when existing information suffices for accurate prediction.
LGMay 13
Unsupervised learning of acquisition variability in structural connectomes via hybrid latent space modelingGaurav Rudravaram, Lianrui Zuo, Karthik Ramadass et al.
Acquisition differences across sites, scanners, and protocols in dMRI introduce variability that complicates structural connectome analysis. This motivates deep learning models that can represent high-dimensional connectomes in a low-dimensional space while explicitly separating acquisition-related effects from biological variation. Conventional dimensionality reduction methods model all variance as continuous, so acquisition effects often get absorbed into a continuous latent space. Recent hybrid latent-space models combine discrete and continuous components to address this, but typically require manual capacity tuning to ensure the discrete component captures the intended variability. We introduce an unsupervised framework that removes this manual tuning by architecturally annealing encoder outputs before decoding, allowing the model to adaptively balance discrete and continuous latent variables during training. To evaluate it, we curated a dataset of N=7,416 structural connectomes derived from dMRI, spanning ages 2 to 102 and 13 studies with 25 unique acquisition-parameter combinations. Of these, 5,900 are cognitively unimpaired, 877 have mild cognitive impairment (MCI), and 639 have Alzheimer's disease (AD). We compare against a standard VAE, PCA with k-means clustering, and hybrid models that anneal only through the loss function. Our architectural annealing produces stronger site learning (ARI=0.53, p<0.05) than these baselines. Results show that a hybrid continuous-discrete latent space, with architectural rather than loss-based annealing, provides a useful unsupervised mechanism for capturing acquisition variability in dMRI: by jointly modeling smooth and categorical structure, the Joint-VAE recovers clusters aligned with scanner and protocol differences.
CVMay 26, 2025Code
Rep3D: Re-parameterize Large 3D Kernels with Low-Rank Receptive Modeling for Medical ImagingHo Hin Lee, Quan Liu, Shunxing Bao et al.
In contrast to vision transformers, which model long-range dependencies through global self-attention, large kernel convolutions provide a more efficient and scalable alternative, particularly in high-resolution 3D volumetric settings. However, naively increasing kernel size often leads to optimization instability and degradation in performance. Motivated by the spatial bias observed in effective receptive fields (ERFs), we hypothesize that different kernel elements converge at variable rates during training. To support this, we derive a theoretical connection between element-wise gradients and first-order optimization, showing that structurally re-parameterized convolution blocks inherently induce spatially varying learning rates. Building on this insight, we introduce Rep3D, a 3D convolutional framework that incorporates a learnable spatial prior into large kernel training. A lightweight two-stage modulation network generates a receptive-biased scaling mask, adaptively re-weighting kernel updates and enabling local-to-global convergence behavior. Rep3D adopts a plain encoder design with large depthwise convolutions, avoiding the architectural complexity of multi-branch compositions. We evaluate Rep3D on five challenging 3D segmentation benchmarks and demonstrate consistent improvements over state-of-the-art baselines, including transformer-based and fixed-prior re-parameterization methods. By unifying spatial inductive bias with optimization-aware learning, Rep3D offers an interpretable, and scalable solution for 3D medical image analysis. The source code is publicly available at https://github.com/leeh43/Rep3D.
CVOct 22, 2021Code
Circle Representation for Medical Object DetectionEthan H. Nguyen, Haichun Yang, Ruining Deng et al.
Box representation has been extensively used for object detection in computer vision. Such representation is efficacious but not necessarily optimized for biomedical objects (e.g., glomeruli), which play an essential role in renal pathology. In this paper, we propose a simple circle representation for medical object detection and introduce CircleNet, an anchor-free detection framework. Compared with the conventional bounding box representation, the proposed bounding circle representation innovates in three-fold: (1) it is optimized for ball-shaped biomedical objects; (2) The circle representation reduced the degree of freedom compared with box representation; (3) It is naturally more rotation invariant. When detecting glomeruli and nuclei on pathological images, the proposed circle representation achieved superior detection performance and be more rotation-invariant, compared with the bounding box. The code has been made publicly available: https://github.com/hrlblab/CircleNet
CVJul 27, 2021Code
Technical Report: Quality Assessment Tool for Machine Learning with Clinical CTRiqiang Gao, Mirza S. Khan, Yucheng Tang et al.
Image Quality Assessment (IQA) is important for scientific inquiry, especially in medical imaging and machine learning. Potential data quality issues can be exacerbated when human-based workflows use limited views of the data that may obscure digital artifacts. In practice, multiple factors such as network issues, accelerated acquisitions, motion artifacts, and imaging protocol design can impede the interpretation of image collections. The medical image processing community has developed a wide variety of tools for the inspection and validation of imaging data. Yet, IQA of computed tomography (CT) remains an under-recognized challenge, and no user-friendly tool is commonly available to address these potential issues. Here, we create and illustrate a pipeline specifically designed to identify and resolve issues encountered with large-scale data mining of clinically acquired CT data. Using the widely studied National Lung Screening Trial (NLST), we have identified approximately 4% of image volumes with quality concerns out of 17,392 scans. To assess robustness, we applied the proposed pipeline to our internal datasets where we find our tool is generalizable to clinically acquired medical images. In conclusion, the tool has been useful and time-saving for research study of clinical data, and the code and tutorials are publicly available at https://github.com/MASILab/QA_tool.
CVJul 19, 2021Code
Compound Figure Separation of Biomedical Images with Side LossTianyuan Yao, Chang Qu, Quan Liu et al.
Unsupervised learning algorithms (e.g., self-supervised learning, auto-encoder, contrastive learning) allow deep learning models to learn effective image representations from large-scale unlabeled data. In medical image analysis, even unannotated data can be difficult to obtain for individual labs. Fortunately, national-level efforts have been made to provide efficient access to obtain biomedical image data from previous scientific publications. For instance, NIH has launched the Open-i search engine that provides a large-scale image database with free access. However, the images in scientific publications consist of a considerable amount of compound figures with subplots. To extract and curate individual subplots, many different compound figure separation approaches have been developed, especially with the recent advances in deep learning. However, previous approaches typically required resource extensive bounding box annotation to train detection models. In this paper, we propose a simple compound figure separation (SimCFS) framework that uses weak classification annotations from individual images. Our technical contribution is three-fold: (1) we introduce a new side loss that is designed for compound figure separation; (2) we introduce an intra-class image augmentation method to simulate hard cases; (3) the proposed framework enables an efficient deployment to new classes of images, without requiring resource extensive bounding box annotations. From the results, the SimCFS achieved a new state-of-the-art performance on the ImageCLEF 2016 Compound Figure Separation Database. The source code of SimCFS is made publicly available at https://github.com/hrlblab/ImageSeperation.
CVJul 28, 2020Code
Faster Mean-shift: GPU-accelerated clustering for cosine embedding-based cell segmentation and trackingMengyang Zhao, Aadarsh Jha, Quan Liu et al.
Recently, single-stage embedding based deep learning algorithms gain increasing attention in cell segmentation and tracking. Compared with the traditional "segment-then-associate" two-stage approach, a single-stage algorithm not only simultaneously achieves consistent instance cell segmentation and tracking but also gains superior performance when distinguishing ambiguous pixels on boundaries and overlaps. However, the deployment of an embedding based algorithm is restricted by slow inference speed (e.g., around 1-2 mins per frame). In this study, we propose a novel Faster Mean-shift algorithm, which tackles the computational bottleneck of embedding based cell segmentation and tracking. Different from previous GPU-accelerated fast mean-shift algorithms, a new online seed optimization policy (OSOP) is introduced to adaptively determine the minimal number of seeds, accelerate computation, and save GPU memory. With both embedding simulation and empirical validation via the four cohorts from the ISBI cell tracking challenge, the proposed Faster Mean-shift algorithm achieved 7-10 times speedup compared to the state-of-the-art embedding based cell instance segmentation and tracking algorithm. Our Faster Mean-shift algorithm also achieved the highest computational speed compared to other GPU benchmarks with optimized memory consumption. The Faster Mean-shift is a plug-and-play model, which can be employed on other pixel embedding based clustering inference for medical image analysis. (Plug-and-play model is publicly available: https://github.com/masqm/Faster-Mean-Shift)
CVOct 11, 2019Code
Methods and open-source toolkit for analyzing and visualizing challenge resultsManuel Wiesenfarth, Annika Reinke, Bennett A. Landman et al.
Biomedical challenges have become the de facto standard for benchmarking biomedical image analysis algorithms. While the number of challenges is steadily increasing, surprisingly little effort has been invested in ensuring high quality design, execution and reporting for these international competitions. Specifically, results analysis and visualization in the event of uncertainties have been given almost no attention in the literature. Given these shortcomings, the contribution of this paper is two-fold: (1) We present a set of methods to comprehensively analyze and visualize the results of single-task and multi-task challenges and apply them to a number of simulated and real-life challenges to demonstrate their specific strengths and weaknesses; (2) We release the open-source framework challengeR as part of this work to enable fast and wide adoption of the methodology proposed in this paper. Our approach offers an intuitive way to gain important insights into the relative and absolute performance of algorithms, which cannot be revealed by commonly applied visualization techniques. This is demonstrated by the experiments performed within this work. Our framework could thus become an important tool for analyzing and visualizing challenge results in the field of biomedical image analysis and beyond.