Anterior's Approach to Fairness Evaluation of Automated Prior Authorization System
This addresses fairness evaluation for administrative healthcare AI systems, but it is incremental as it adapts existing fairness concepts to a specific domain.
The authors tackled the challenge of evaluating fairness in automated prior authorization systems by proposing a framework based on model error rates rather than approval outcomes, using 7,166 cases across 27 guidelines; they found consistent error rates across most demographics, with inconclusive evidence for race/ethnicity due to limited sample sizes.
Increasing staffing constraints and turnaround-time pressures in Prior authorization (PA) have led to increasing automation of decision systems to support PA review. Evaluating fairness in such systems poses unique challenges because legitimate clinical guidelines and medical necessity criteria often differ across demographic groups, making parity in approval rates an inappropriate fairness metric. We propose a fairness evaluation framework for prior authorization models based on model error rates rather than approval outcomes. Using 7,166 human-reviewed cases spanning 27 medical necessity guidelines, we assessed consistency in sex, age, race/ethnicity, and socioeconomic status. Our evaluation combined error-rate comparisons, tolerance-band analysis with a predefined $\pm$5 percentage-point margin, statistical power evaluation, and protocol-controlled logistic regression. Across most demographics, model error rates were consistent, and confidence intervals fell within the predefined tolerance band, indicating no meaningful performance differences. For race/ethnicity, point estimates remain small, but subgroup sample sizes were limited, resulting in wide confidence intervals and underpowered tests, with inconclusive evidence within the dataset we explored. These findings illustrate a rigorous and regulator-aligned approach to fairness evaluation in administrative healthcare AI systems.