Intrinsic Bias Identification on Medical Image Datasets
This addresses the challenge for scientists and practitioners in medical imaging who lack reliable unbiased test datasets to validate models, though it is incremental as it builds on existing debiasing studies.
The paper tackles the problem of identifying implicit biases in medical image datasets, which degrade model generalizability, by proposing a novel bias identification framework that includes KlotskiNet and Bias Discriminant Direction Analysis, showing effectiveness on three datasets.
Machine learning based medical image analysis highly depends on datasets. Biases in the dataset can be learned by the model and degrade the generalizability of the applications. There are studies on debiased models. However, scientists and practitioners are difficult to identify implicit biases in the datasets, which causes lack of reliable unbias test datasets to valid models. To tackle this issue, we first define the data intrinsic bias attribute, and then propose a novel bias identification framework for medical image datasets. The framework contains two major components, KlotskiNet and Bias Discriminant Direction Analysis(bdda), where KlostkiNet is to build the mapping which makes backgrounds to distinguish positive and negative samples and bdda provides a theoretical solution on determining bias attributes. Experimental results on three datasets show the effectiveness of the bias attributes discovered by the framework.