CertiFair: A Framework for Certified Global Fairness of Neural Networks
This addresses fairness certification for neural networks, which is crucial for deploying AI in sensitive domains like hiring or lending, though it builds incrementally on existing fairness verification methods.
The paper tackles the problem of verifying and training neural networks for global individual fairness, ensuring similar individuals receive similar classifications. It presents a sound and complete verifier and a training method that improves fairness by 96% on standard datasets with minimal accuracy loss.
We consider the problem of whether a Neural Network (NN) model satisfies global individual fairness. Individual Fairness suggests that similar individuals with respect to a certain task are to be treated similarly by the decision model. In this work, we have two main objectives. The first is to construct a verifier which checks whether the fairness property holds for a given NN in a classification task or provide a counterexample if it is violated, i.e., the model is fair if all similar individuals are classified the same, and unfair if a pair of similar individuals are classified differently. To that end, We construct a sound and complete verifier that verifies global individual fairness properties of ReLU NN classifiers using distance-based similarity metrics. The second objective of this paper is to provide a method for training provably fair NN classifiers from unfair (biased) data. We propose a fairness loss that can be used during training to enforce fair outcomes for similar individuals. We then provide provable bounds on the fairness of the resulting NN. We run experiments on commonly used fairness datasets that are publicly available and we show that global individual fairness can be improved by 96 % without significant drop in test accuracy.