LG AI CYJul 24, 2024

Dataset Distribution Impacts Model Fairness: Single vs. Multi-Task Learning

Ralf Raumanns, Gerard Schouten, Josien P. W. Pluim, Veronika Cheplygina

arXiv:2407.17543v26.43 citationsh-index: 55

Originality Incremental advance

AI Analysis

This addresses fairness issues in medical AI for dermatology, but it is incremental as it builds on existing bias research with specific dataset manipulations.

The study tackled the problem of sex bias in skin lesion classification models by evaluating how dataset distribution and learning strategies affect fairness, finding that adversarial learning eliminated sex bias for female patients and including male patients improved performance for the male subgroup.

The influence of bias in datasets on the fairness of model predictions is a topic of ongoing research in various fields. We evaluate the performance of skin lesion classification using ResNet-based CNNs, focusing on patient sex variations in training data and three different learning strategies. We present a linear programming method for generating datasets with varying patient sex and class labels, taking into account the correlations between these variables. We evaluated the model performance using three different learning strategies: a single-task model, a reinforcing multi-task model, and an adversarial learning scheme. Our observations include: 1) sex-specific training data yields better results, 2) single-task models exhibit sex bias, 3) the reinforcement approach does not remove sex bias, 4) the adversarial model eliminates sex bias in cases involving only female patients, and 5) datasets that include male patients enhance model performance for the male subgroup, even when female patients are the majority. To generalise these findings, in future research, we will examine more demographic attributes, like age, and other possibly confounding factors, such as skin colour and artefacts in the skin lesions. We make all data and models available on GitHub.

View on arXiv PDF

Similar