IVCVJan 23, 2025

Segment-and-Classify: ROI-Guided Generalizable Contrast Phase Classification in CT Using XGBoost

arXiv:2501.14066v2h-index: 12
Originality Incremental advance
AI Analysis

This incremental work addresses the problem of automating medical image analysis for radiologists by improving classification accuracy and generalizability across datasets.

The study tackled automated contrast phase classification in CT by using organ-specific features from a segmentation tool with a gradient-boosted decision tree, achieving high AUCs (e.g., >0.937 on VinDr-Multiphase and >0.991 on C4KC-KiTS) and superior F1-scores in most phases.

Purpose: To automate contrast phase classification in CT using organ-specific features extracted from a widely used segmentation tool with a lightweight decision tree classifier. Materials and Methods: This retrospective study utilized three public CT datasets from separate institutions. The phase prediction model was trained on the WAW-TACE (median age: 66 [60,73]; 185 males) dataset, and externally validated on the VinDr-Multiphase (146 males; 63 females; 56 unk) and C4KC-KiTS (median age: 61 [50.68; 123 males) datasets. Contrast phase classification was performed using organ-specific features extracted by TotalSegmentator, followed by prediction using a gradient-boosted decision tree classifier. Results: On the VinDr-Multiphase dataset, the phase prediction model achieved the highest or comparable AUCs across all phases (>0.937), with superior F1-scores in the non-contrast (0.994), arterial (0.937), and delayed (0.718) phases. Statistical testing indicated significant performance differences only in the arterial and delayed phases (p<0.05). On the C4KC-KiTS dataset, the phase prediction model achieved the highest AUCs across all phases (>0.991), with superior F1-scores in arterial/venous (0.968) and delayed (0.935) phases. Statistical testing confirmed significant improvements over all baseline models in these two phases (p<0.05). Performance in the non-contrast class, however, was comparable across all models, with no statistically significant differences observed (p>0.05). Conclusion: The lightweight model demonstrated strong performance relative to all baseline models, and exhibited robust generalizability across datasets from different institutions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes