Comparative Evaluation of Radiomics and Deep Learning Models for Disease Detection in Chest Radiography
It provides practical guidance for selecting AI models in medical imaging, addressing a gap for clinicians and researchers, but is incremental as it compares existing methods.
This study compared radiomics-based and deep learning models for detecting diseases like COVID-19 in chest X-rays, finding that deep learning models like InceptionV3 achieved higher AUC scores (e.g., 0.996 at 4000 samples) and better scalability with more data, while radiomics models performed worse (e.g., Random Forest AUC 0.885).
The application of artificial intelligence (AI) in medical imaging has revolutionized diagnostic practices, enabling advanced analysis and interpretation of radiological data. This study presents a comprehensive evaluation of radiomics-based and deep learning-based approaches for disease detection in chest radiography, focusing on COVID-19, lung opacity, and viral pneumonia. While deep learning models, particularly convolutional neural networks and vision transformers, learn directly from image data, radiomics-based models extract handcrafted features, offering potential advantages in data-limited scenarios. We systematically compared the diagnostic performance of various AI models, including Decision Trees, Gradient Boosting, Random Forests, Support Vector Machines, and Multi-Layer Perceptrons for radiomics, against state-of-the-art deep learning models such as InceptionV3, EfficientNetL, and ConvNeXtXLarge. Performance was evaluated across multiple sample sizes. At 24 samples, EfficientNetL achieved an AUC of 0.839, outperforming SVM (AUC = 0.762). At 4000 samples, InceptionV3 achieved the highest AUC of 0.996, compared to 0.885 for Random Forest. A Scheirer-Ray-Hare test confirmed significant main and interaction effects of model type and sample size on all metrics. Post hoc Mann-Whitney U tests with Bonferroni correction further revealed consistent performance advantages for deep learning models across most conditions. These findings provide statistically validated, data-driven recommendations for model selection in diagnostic AI. Deep learning models demonstrated higher performance and better scalability with increasing data availability, while radiomics-based models may remain useful in low-data contexts. This study addresses a critical gap in AI-based diagnostic research by offering practical guidance for deploying AI models across diverse clinical environments.