An Algorithmic Framework for Bias Bounties
This addresses bias in machine learning models for stakeholders like developers and users, offering a structured approach to fairness improvements.
The authors tackled the problem of algorithmic bias by proposing a framework for bias bounties, where external participants suggest subgroup improvements to a trained model, and they demonstrated that their algorithm converges provably to optimal or stable states without trade-offs in accuracies.
We propose and analyze an algorithmic framework for "bias bounties": events in which external participants are invited to propose improvements to a trained model, akin to bug bounty events in software and security. Our framework allows participants to submit arbitrary subgroup improvements, which are then algorithmically incorporated into an updated model. Our algorithm has the property that there is no tension between overall and subgroup accuracies, nor between different subgroup accuracies, and it enjoys provable convergence to either the Bayes optimal model or a state in which no further improvements can be found by the participants. We provide formal analyses of our framework, experimental evaluation, and findings from a preliminary bias bounty event.