To Pool or Not To Pool: Analyzing the Regularizing Effects of Group-Fair Training on Shared Models
This work addresses over-fitting issues in group-fair training for machine learning practitioners, but it is incremental as it builds on existing fair learning objectives.
The paper tackles performance disparities in fair machine learning by deriving group-specific generalization error bounds that leverage the majority group's larger sample size, showing through simulations that these bounds improve over naive methods, especially for smaller groups.
In fair machine learning, one source of performance disparities between groups is over-fitting to groups with relatively few training samples. We derive group-specific bounds on the generalization error of welfare-centric fair machine learning that benefit from the larger sample size of the majority group. We do this by considering group-specific Rademacher averages over a restricted hypothesis class, which contains the family of models likely to perform well with respect to a fair learning objective (e.g., a power-mean). Our simulations demonstrate these bounds improve over a naive method, as expected by theory, with particularly significant improvement for smaller group sizes.