FairLogue: Evaluating Intersectional Fairness across Clinical Machine Learning Use Cases using the All of Us Research Program
Provides a toolkit for intersectional fairness auditing in clinical ML, but the finding that most disparities are not attributable to group membership limits its practical impact.
FairLogue, a toolkit for intersectional fairness auditing, was applied to two clinical prediction models using the All of Us dataset. Intersectional evaluation revealed larger disparities than single-axis analyses, but counterfactual diagnostics showed most disparities were comparable to those expected under randomized group membership.
Intersectional biases in healthcare data can produce compound disparities in clinical machine learning models, yet most fairness evaluations assess demographic attributes independently. FairLogue, a toolkit for intersectional fairness auditing, was applied across multiple clinical prediction tasks to evaluate disparities across combined demographic groups. Using the All of Us dataset, two published models were selected for replication and evaluation: (A) prediction of selective serotonin reuptake inhibitor associated bleeding events and (B) two-year stroke risk in patients with atrial fibrillation. Observational fairness metrics were computed across race, gender, and intersectional subgroups, followed by counterfactual analysis to evaluate whether disparities were attributable to group membership. Intersectional evaluation revealed larger disparities than single-axis analyses; however, counterfactual diagnostics indicated that most observed disparities were comparable to those expected under randomized group membership. These results highlight the importance of intersectional fairness auditing and demonstrate how FairLogue provides deeper insight into bias in clinical machine learning systems.