LGAIITJun 9, 2025

Fairness Overfitting in Machine Learning: An Information-Theoretic Perspective

arXiv:2506.07861v12 citationsh-index: 8ICML
Originality Incremental advance
AI Analysis

This addresses a critical issue for high-stake applications where fairness must generalize, though it is incremental as it builds on existing fairness methods by adding theoretical guarantees.

The paper tackles the problem of fairness overfitting in machine learning, where fairness achieved during training may not generalize to unseen data, by proposing a theoretical framework using information theory to derive tight generalization bounds, with empirical validation showing their relevance across algorithms.

Despite substantial progress in promoting fairness in high-stake applications using machine learning models, existing methods often modify the training process, such as through regularizers or other interventions, but lack formal guarantees that fairness achieved during training will generalize to unseen data. Although overfitting with respect to prediction performance has been extensively studied, overfitting in terms of fairness loss has received far less attention. This paper proposes a theoretical framework for analyzing fairness generalization error through an information-theoretic lens. Our novel bounding technique is based on Efron-Stein inequality, which allows us to derive tight information-theoretic fairness generalization bounds with both Mutual Information (MI) and Conditional Mutual Information (CMI). Our empirical results validate the tightness and practical relevance of these bounds across diverse fairness-aware learning algorithms. Our framework offers valuable insights to guide the design of algorithms improving fairness generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes