Demystifying the Optimal Fair Classifier in Multi-Class Classification
For machine learning practitioners needing fair classifiers in multi-class settings, this work provides theoretically grounded and practical algorithms, though it is an incremental extension of existing fairness frameworks.
The paper addresses the challenge of achieving optimal accuracy-fairness trade-offs in multi-class classification, proposing two algorithms (in-processing and post-processing) that converge to the Pareto frontier, with experiments showing superior balance between accuracy and fairness.
Ensuring fair and equitable treatment across diverse groups, particularly in multi-class classification tasks, poses a significant challenge due to the persistent biases inherent in machine learning models. Most existing bias mitigation techniques are tailored to binary settings, and the presence of multi-dimensional outputs and complex fairness mechanisms makes their extension to multi-class scenarios neither straightforward nor effective. In this paper, we investigate two fundamental, unresolved challenges in fair classification: (i) characterizing the optimal accuracy-fairness frontier in multi-class settings, and (ii) designing practical algorithms that attain this optimum in different training phases. To tackle these challenges, we first specify an analytically tractable probabilistic formulation of the optimal classifier under fairness constraints. Building upon this, we propose two attribute-blind algorithms to enforce fairness requirements in practice: an in-processing approach for fairness intervention during training via the reduction approach, and a post-processing approach for fine-tuning output probabilities with plug-in estimation. Theoretical analysis reveals that both methods converge to the optimal accuracy-fairness Pareto frontier. Experiments conducted on multiple datasets demonstrate the superior performance of our methods in balancing accuracy and fairness.