LGAICYMLJun 16, 2021

Costs and Benefits of Fair Regression

arXiv:2106.08812v211 citations
Originality Highly original
AI Analysis

This addresses fairness-accuracy tradeoffs in regression for high-stakes ML applications, providing theoretical insights and practical methods.

The paper characterizes the inherent tradeoff between statistical parity and accuracy in regression by providing a sharp, algorithm-independent lower bound on error for any fair regressor, showing that when target moments differ between groups, fair algorithms must make errors on at least one group. It also establishes connections between individual fairness, accuracy parity, and Wasserstein distance, and develops a practical algorithm tested on real-world data.

Real-world applications of machine learning tools in high-stakes domains are often regulated to be fair, in the sense that the predicted target should satisfy some quantitative notion of parity with respect to a protected attribute. However, the exact tradeoff between fairness and accuracy with a real-valued target is not entirely clear. In this paper, we characterize the inherent tradeoff between statistical parity and accuracy in the regression setting by providing a lower bound on the error of any fair regressor. Our lower bound is sharp, algorithm-independent, and admits a simple interpretation: when the moments of the target differ between groups, any fair algorithm has to make an error on at least one of the groups. We further extend this result to give a lower bound on the joint error of any (approximately) fair algorithm, using the Wasserstein distance to measure the quality of the approximation. With our novel lower bound, we also show that the price paid by a fair regressor that does not take the protected attribute as input is less than that of a fair regressor with explicit access to the protected attribute. On the upside, we establish the first connection between individual fairness, accuracy parity, and the Wasserstein distance by showing that if a regressor is individually fair, it also approximately verifies the accuracy parity, where the gap is given by the Wasserstein distance between the two groups. Inspired by our theoretical results, we develop a practical algorithm for fair regression through the lens of representation learning, and conduct experiments on a real-world dataset to corroborate our findings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes