Michael R. Harowicz

CV
h-index13
3papers
148citations
Novelty23%
AI Score30

3 Papers

CVMay 7, 2024
AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets

Fakrul Islam Tushar, Avivah Wang, Lavsen Dahal et al.

Background: Development of artificial intelligence (AI) models for lung cancer screening requires large, well-annotated low-dose computed tomography (CT) datasets and rigorous performance benchmarks. Purpose: To create a reproducible benchmarking resource leveraging the Duke Lung Cancer Screening (DLCS) and multiple public datasets to develop and evaluate models for nodule detection and classification. Materials & Methods: This retrospective study uses the DLCS dataset (1,613 patients; 2,487 nodules) and external datasets including LUNA16, LUNA25, and NLST-3D. For detection, MONAI RetinaNet models were trained on DLCS (DLCS-De) and LUNA16 (LUNA16-De) and evaluated using the Competition Performance Metric (CPM). For nodule-level classification, we compare five strategies: pretrained models (Models Genesis, Med3D), a self-supervised foundation model (FMCB), and ResNet50 with random initialization versus Strategic Warm-Start (ResNet50-SWS) pretrained with detection-derived candidate patches stratified by confidence. Results: For detection on the DLCS test set, DLCS-De achieved sensitivity 0.82 at 2 false positives/scan (CPM 0.63) versus LUNA16-De (0.62, CPM 0.45). For external validation on NLST-3D, DLCS-De (sensitivity 0.72, CPM 0.58) also outperformed LUNA16-De (sensitivity 0.64, CPM 0.49). For classification across multiple datasets, ResNet50-SWS attained AUCs of 0.71 (DLCS; 95% CI, 0.61-0.81), 0.90 (LUNA16; 0.87-0.93), 0.81 (NLST-3D; 0.79-0.82), and 0.80 (LUNA25; 0.78-0.82), matching or exceeding pretrained/self-supervised baselines. Performance differences reflected dataset label standards. Conclusion: This work establishes a standardized benchmarking resource for lung cancer AI research, supporting model development, validation, and translation. All code, models, and data are publicly released to promote reproducibility.

CVSep 18, 2025
Transplant-Ready? Evaluating AI Lung Segmentation Models in Candidates with Severe Lung Disease

Jisoo Lee, Michael R. Harowicz, Yuwen Chen et al.

This study evaluates publicly available deep-learning based lung segmentation models in transplant-eligible patients to determine their performance across disease severity levels, pathology categories, and lung sides, and to identify limitations impacting their use in preoperative planning in lung transplantation. This retrospective study included 32 patients who underwent chest CT scans at Duke University Health System between 2017 and 2019 (total of 3,645 2D axial slices). Patients with standard axial CT scans were selected based on the presence of two or more lung pathologies of varying severity. Lung segmentation was performed using three previously developed deep learning models: Unet-R231, TotalSegmentator, MedSAM. Performance was assessed using quantitative metrics (volumetric similarity, Dice similarity coefficient, Hausdorff distance) and a qualitative measure (four-point clinical acceptability scale). Unet-R231 consistently outperformed TotalSegmentator and MedSAM in general, for different severity levels, and pathology categories (p<0.05). All models showed significant performance declines from mild to moderate-to-severe cases, particularly in volumetric similarity (p<0.05), without significant differences among lung sides or pathology types. Unet-R231 provided the most accurate automated lung segmentation among evaluated models with TotalSegmentator being a close second, though their performance declined significantly in moderate-to-severe cases, emphasizing the need for specialized model fine-tuning in severe pathology contexts.

CVNov 29, 2017
Deep Learning for identifying radiogenomic associations in breast cancer

Zhe Zhu, Ehab Albadawy, Ashirbani Saha et al.

Purpose: To determine whether deep learning models can distinguish between breast cancer molecular subtypes based on dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). Materials and methods: In this institutional review board-approved single-center study, we analyzed DCE-MR images of 270 patients at our institution. Lesions of interest were identified by radiologists. The task was to automatically determine whether the tumor is of the Luminal A subtype or of another subtype based on the MR image patches representing the tumor. Three different deep learning approaches were used to classify the tumor according to their molecular subtypes: learning from scratch where only tumor patches were used for training, transfer learning where networks pre-trained on natural images were fine-tuned using tumor patches, and off-the-shelf deep features where the features extracted by neural networks trained on natural images were used for classification with a support vector machine. Network architectures utilized in our experiments were GoogleNet, VGG, and CIFAR. We used 10-fold crossvalidation method for validation and area under the receiver operating characteristic (AUC) as the measure of performance. Results: The best AUC performance for distinguishing molecular subtypes was 0.65 (95% CI:[0.57,0.71]) and was achieved by the off-the-shelf deep features approach. The highest AUC performance for training from scratch was 0.58 (95% CI:[0.51,0.64]) and the best AUC performance for transfer learning was 0.60 (95% CI:[0.52,0.65]) respectively. For the off-the-shelf approach, the features extracted from the fully connected layer performed the best. Conclusion: Deep learning may play a role in discovering radiogenomic associations in breast cancer.