LGMLOct 4, 2023

Quantifying and mitigating the impact of label errors on model disparity metrics

arXiv:2310.02533v112 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses the problem of label errors exacerbating model disparities, especially for minority groups, which is an incremental but important step in fairness-aware machine learning.

The study investigated how label errors in training and test data affect model disparity metrics, finding that group calibration and other metrics are particularly sensitive for minority groups, with disparate effects persisting even with noise-aware algorithms. To mitigate this, the authors proposed an approach to estimate label influence on disparity metrics, showing significant improvement in identifying training inputs that enhance group calibration error, complemented by a relabel-and-finetune scheme for provable improvements.

Errors in labels obtained via human annotation adversely affect a model's performance. Existing approaches propose ways to mitigate the effect of label error on a model's downstream accuracy, yet little is known about its impact on a model's disparity metrics. Here we study the effect of label error on a model's disparity metrics. We empirically characterize how varying levels of label error, in both training and test data, affect these disparity metrics. We find that group calibration and other metrics are sensitive to train-time and test-time label error -- particularly for minority groups. This disparate effect persists even for models trained with noise-aware algorithms. To mitigate the impact of training-time label error, we present an approach to estimate the influence of a training input's label on a model's group disparity metric. We empirically assess the proposed approach on a variety of datasets and find significant improvement, compared to alternative approaches, in identifying training inputs that improve a model's disparity metric. We complement the approach with an automatic relabel-and-finetune scheme that produces updated models with, provably, improved group calibration error.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes