Disparate Model Performance and Stability in Machine Learning Clinical Support for Diabetes and Heart Diseases
This addresses inequities in clinical decision-making for chronic disease patients, highlighting that representativeness in training data is insufficient for equitable outcomes, making it an incremental contribution to fairness in biomedical informatics.
The study tackled performance disparities in ML models for clinical support in diabetes and heart diseases, finding that while sex-related disparities were mild (favoring males), age-related differences were significant with better accuracy for younger patients and inconsistent accuracy for older patients across seven datasets.
Machine Learning (ML) algorithms are vital for supporting clinical decision-making in biomedical informatics. However, their predictive performance can vary across demographic groups, often due to the underrepresentation of historically marginalized populations in training datasets. The investigation reveals widespread sex- and age-related inequities in chronic disease datasets and their derived ML models. Thus, a novel analytical framework is introduced, combining systematic arbitrariness with traditional metrics like accuracy and data complexity. The analysis of data from over 25,000 individuals with chronic diseases revealed mild sex-related disparities, favoring predictive accuracy for males, and significant age-related differences, with better accuracy for younger patients. Notably, older patients showed inconsistent predictive accuracy across seven datasets, linked to higher data complexity and lower model performance. This highlights that representativeness in training data alone does not guarantee equitable outcomes, and model arbitrariness must be addressed before deploying models in clinical settings.