CVSep 18, 2025

Transplant-Ready? Evaluating AI Lung Segmentation Models in Candidates with Severe Lung Disease

arXiv:2509.15083v1h-index: 13
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of unreliable AI segmentation for preoperative planning in lung transplantation patients with severe pathologies, though it is incremental as it tests existing models on new data.

This study evaluated three deep learning models for lung segmentation in transplant-eligible patients with severe lung disease, finding that Unet-R231 outperformed others but all models declined significantly in moderate-to-severe cases, with volumetric similarity dropping notably.

This study evaluates publicly available deep-learning based lung segmentation models in transplant-eligible patients to determine their performance across disease severity levels, pathology categories, and lung sides, and to identify limitations impacting their use in preoperative planning in lung transplantation. This retrospective study included 32 patients who underwent chest CT scans at Duke University Health System between 2017 and 2019 (total of 3,645 2D axial slices). Patients with standard axial CT scans were selected based on the presence of two or more lung pathologies of varying severity. Lung segmentation was performed using three previously developed deep learning models: Unet-R231, TotalSegmentator, MedSAM. Performance was assessed using quantitative metrics (volumetric similarity, Dice similarity coefficient, Hausdorff distance) and a qualitative measure (four-point clinical acceptability scale). Unet-R231 consistently outperformed TotalSegmentator and MedSAM in general, for different severity levels, and pathology categories (p<0.05). All models showed significant performance declines from mild to moderate-to-severe cases, particularly in volumetric similarity (p<0.05), without significant differences among lung sides or pathology types. Unet-R231 provided the most accurate automated lung segmentation among evaluated models with TotalSegmentator being a close second, though their performance declined significantly in moderate-to-severe cases, emphasizing the need for specialized model fine-tuning in severe pathology contexts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes