Towards Quantification of Bias in Machine Learning for Healthcare: A Case Study of Renal Failure Prediction
This work addresses bias quantification in healthcare ML, specifically for renal failure prediction, but appears incremental as it focuses on comparing existing methods.
The study compared a traditional risk score (Tangri) with a machine learning model trained on 1.6 million patients' EHR data for renal failure prediction, aiming to quantify biases in clinical practice versus ML-driven approaches.
As machine learning (ML) models, trained on real-world datasets, become common practice, it is critical to measure and quantify their potential biases. In this paper, we focus on renal failure and compare a commonly used traditional risk score, Tangri, with a more powerful machine learning model, which has access to a larger variable set and trained on 1.6 million patients' EHR data. We will compare and discuss the generalization and applicability of these two models, in an attempt to quantify biases of status quo clinical practice, compared to ML-driven models.