Individual Fairness Revisited: Transferring Techniques from Adversarial Robustness
This work addresses the problem of defining and ensuring individual fairness in machine learning models, particularly when it is difficult to specify appropriate metrics beforehand, representing an incremental advance by transferring techniques from adversarial robustness.
The paper tackles the challenge of specifying a suitable fairness metric a priori by proposing to find a metric for a given model that satisfies individual fairness, rather than assessing fairness with a predetermined metric. It introduces minimal metrics and applies randomized smoothing from adversarial robustness to ensure fairness under weighted L^p metrics, with experiments showing meaningful fairness guarantees at little utility cost.
We turn the definition of individual fairness on its head---rather than ascertaining the fairness of a model given a predetermined metric, we find a metric for a given model that satisfies individual fairness. This can facilitate the discussion on the fairness of a model, addressing the issue that it may be difficult to specify a priori a suitable metric. Our contributions are twofold: First, we introduce the definition of a minimal metric and characterize the behavior of models in terms of minimal metrics. Second, for more complicated models, we apply the mechanism of randomized smoothing from adversarial robustness to make them individually fair under a given weighted $L^p$ metric. Our experiments show that adapting the minimal metrics of linear models to more complicated neural networks can lead to meaningful and interpretable fairness guarantees at little cost to utility.