Controllable Guarantees for Fair Outcomes via Contrastive Information Estimation
This work is significant for machine learning practitioners and researchers seeking to develop fair AI systems by providing a method with strong theoretical guarantees for controlling outcome parity.
This paper addresses the problem of controlling bias in training datasets to ensure fair outcomes in downstream applications. The authors propose a method based on limiting the mutual information between representations and protected attributes, demonstrating that it provides more informative representations and stronger theoretical guarantees on parity compared to variational bound approaches on UCI Adult and Heritage Health datasets.
Controlling bias in training datasets is vital for ensuring equal treatment, or parity, between different groups in downstream applications. A naive solution is to transform the data so that it is statistically independent of group membership, but this may throw away too much information when a reasonable compromise between fairness and accuracy is desired. Another common approach is to limit the ability of a particular adversary who seeks to maximize parity. Unfortunately, representations produced by adversarial approaches may still retain biases as their efficacy is tied to the complexity of the adversary used during training. To this end, we theoretically establish that by limiting the mutual information between representations and protected attributes, we can assuredly control the parity of any downstream classifier. We demonstrate an effective method for controlling parity through mutual information based on contrastive information estimators and show that they outperform approaches that rely on variational bounds based on complex generative models. We test our approach on UCI Adult and Heritage Health datasets and demonstrate that our approach provides more informative representations across a range of desired parity thresholds while providing strong theoretical guarantees on the parity of any downstream algorithm.