CV CY LGMay 2, 2023

On the Impact of Data Quality on Image Classification Fairness

arXiv:2305.01595v15.97 citations

Originality Synthesis-oriented

AI Analysis

It addresses fairness issues in algorithmic decision-making for image classification, but the approach is incremental as it applies existing methods to analyze noise impact.

This paper investigates how data quality, specifically label inaccuracies and data distortions, affects fairness metrics in image classification models, finding that increased noise in training data leads to reduced fairness across various algorithms.

With the proliferation of algorithmic decision-making, increased scrutiny has been placed on these systems. This paper explores the relationship between the quality of the training data and the overall fairness of the models trained with such data in the context of supervised classification. We measure key fairness metrics across a range of algorithms over multiple image classification datasets that have a varying level of noise in both the labels and the training data itself. We describe noise in the labels as inaccuracies in the labelling of the data in the training set and noise in the data as distortions in the data, also in the training set. By adding noise to the original datasets, we can explore the relationship between the quality of the training data and the fairness of the output of the models trained on that data.

View on arXiv PDF

Similar