Detecting and Monitoring Bias for Subgroups in Breast Cancer Detection AI
This addresses bias detection in AI for breast cancer screening, which is crucial for ensuring equitable healthcare outcomes, but it is incremental as it applies existing monitoring methods to this domain.
The paper analyzed AI models for breast cancer detection on mammography datasets and found notable underperformance in certain subgroups, highlighting the need for ongoing monitoring to detect performance drifts and enable timely interventions.
Automated mammography screening plays an important role in early breast cancer detection. However, current machine learning models, developed on some training datasets, may exhibit performance degradation and bias when deployed in real-world settings. In this paper, we analyze the performance of high-performing AI models on two mammography datasets-the Emory Breast Imaging Dataset (EMBED) and the RSNA 2022 challenge dataset. Specifically, we evaluate how these models perform across different subgroups, defined by six attributes, to detect potential biases using a range of classification metrics. Our analysis identifies certain subgroups that demonstrate notable underperformance, highlighting the need for ongoing monitoring of these subgroups' performance. To address this, we adopt a monitoring method designed to detect performance drifts over time. Upon identifying a drift, this method issues an alert, which can enable timely interventions. This approach not only provides a tool for tracking the performance but also helps ensure that AI models continue to perform effectively across diverse populations.