Deep Learning for Automated Classification of Tuberculosis-Related Chest X-Ray: Dataset Specificity Limits Diagnostic Performance Generalizability
This highlights a critical barrier for deploying AI in tuberculosis diagnosis across diverse populations, indicating that current models are incremental and not ready for broad clinical use.
The study tackled the problem of limited generalizability in deep learning models for tuberculosis detection from chest X-rays by training a model on one population's dataset and testing it on another, finding that diagnostic performance did not transfer effectively, with specific issues like dataset differences affecting results.
Machine learning has been an emerging tool for various aspects of infectious diseases including tuberculosis surveillance and detection. However, WHO provided no recommendations on using computer-aided tuberculosis detection software because of the small number of studies, methodological limitations, and limited generalizability of the findings. To quantify the generalizability of the machine-learning model, we developed a Deep Convolutional Neural Network (DCNN) model using a TB-specific CXR dataset of one population (National Library of Medicine Shenzhen No.3 Hospital) and tested it with non-TB-specific CXR dataset of another population (National Institute of Health Clinical Centers). The findings suggested that a supervised deep learning model developed by using the training dataset from one population may not have the same diagnostic performance in another population. Technical specification of CXR images, disease severity distribution, overfitting, and overdiagnosis should be examined before implementation in other settings.