Consistent End-to-End Estimation for Counterfactual Fairness
This addresses fairness issues in machine learning for legal, ethical, and societal applications, but it is incremental as it builds on existing counterfactual fairness settings.
The paper tackles the challenge of achieving counterfactual fairness in predictions by proposing a novel predictor that learns counterfactual distributions via neural networks and enforces fairness through mediator regularization, achieving state-of-the-art performance on various datasets with theoretical guarantees.
Fairness in predictions is of direct importance in practice due to legal, ethical, and societal reasons. This is often accomplished through counterfactual fairness, which ensures that the prediction for an individual is the same as that in a counterfactual world under a different sensitive attribute. However, achieving counterfactual fairness is challenging as counterfactuals are unobservable, and, because of that, existing baselines for counterfactual fairness do not have theoretical guarantees. In this paper, we propose a novel counterfactual fairness predictor for making predictions under counterfactual fairness. Here, we follow the standard counterfactual fairness setting and directly learn the counterfactual distribution of the descendants of the sensitive attribute via tailored neural networks, which we then use to enforce fair predictions through a novel counterfactual mediator regularization. Unique to our work is that we provide theoretical guarantees that our method is effective in ensuring the notion of counterfactual fairness. We further compare the performance across various datasets, where our method achieves state-of-the-art performance.