An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in Healthcare Datasets
This addresses bias in healthcare algorithms for disadvantaged groups, offering a novel metric but with incremental improvements in bias detection methods.
The paper tackled bias in healthcare datasets by developing AEquity (AEq), a data-centric metric to evaluate how easily different groups are learned at small sample sizes, and applied it to detect and mitigate racial bias in chest X-ray diagnosis and healthcare utilization prediction.
The adoption of diagnosis and prognostic algorithms in healthcare has led to concerns about the perpetuation of bias against disadvantaged groups of individuals. Deep learning methods to detect and mitigate bias have revolved around modifying models, optimization strategies, and threshold calibration with varying levels of success. Here, we generate a data-centric, model-agnostic, task-agnostic approach to evaluate dataset bias by investigating the relationship between how easily different groups are learned at small sample sizes (AEquity). We then apply a systematic analysis of AEq values across subpopulations to identify and mitigate manifestations of racial bias in two known cases in healthcare - Chest X-rays diagnosis with deep convolutional neural networks and healthcare utilization prediction with multivariate logistic regression. AEq is a novel and broadly applicable metric that can be applied to advance equity by diagnosing and remediating bias in healthcare datasets.