LG AIMay 2, 2024

Individual Fairness Through Reweighting and Tuning

Abdoul Jalil Djiberou Mahamadou, Lea Goetz, Russ Altman

arXiv:2405.01711v22.6h-index: 3

Originality Synthesis-oriented

AI Analysis

This work addresses fairness in AI for credit approval applications, but it is incremental as it builds on prior GLR methods and focuses on metric evaluation.

The paper tackled the problem of individual fairness in AI systems by investigating whether defining a Graph Laplacian Regularizer (GLR) independently on train and target data maintains accuracy compared to prior methods, and introduced the Normalized Fairness Gain (NFG) metric to measure fairness gains; results on the German Credit Approval dataset showed similar statistical performance between methods and revealed that Prediction Consistency scores can be misleading while NFG provides better insights.

Inherent bias within society can be amplified and perpetuated by artificial intelligence (AI) systems. To address this issue, a wide range of solutions have been proposed to identify and mitigate bias and enforce fairness for individuals and groups. Recently, Graph Laplacian Regularizer (GLR), a regularization technique from the semi-supervised learning literature has been used as a substitute for the common Lipschitz condition to enhance individual fairness. Notable prior work has shown that enforcing individual fairness through a GLR can improve the transfer learning accuracy of AI models under covariate shifts. However, the prior work defines a GLR on the source and target data combined, implicitly assuming that the target data are available at train time, which might not hold in practice. In this work, we investigated whether defining a GLR independently on the train and target data could maintain similar accuracy. Furthermore, we introduced the Normalized Fairness Gain score (NFG) to measure individual fairness by measuring the amount of gained fairness when a GLR is used versus not. We evaluated the new and original methods under NFG, the Prediction Consistency (PC), and traditional classification metrics on the German Credit Approval dataset. The results showed that the two models achieved similar statistical mean performances over five-fold cross-validation. Furthermore, the proposed metric showed that PC scores can be misleading as the scores can be high and statistically similar to fairness-enhanced models while NFG scores are small. This work therefore provides new insights into when a GLR effectively enhances individual fairness and the pitfalls of PC.

View on arXiv PDF

Similar