FairBatch: Batch Selection for Model Fairness
FairBatch addresses the problem of demographic disparity in machine learning models by providing an easy-to-adopt solution for practitioners, requiring only a single-line code change without modifying data preprocessing or model training.
This paper introduces FairBatch, a batch selection algorithm that uses bilevel optimization to adaptively select minibatch sizes to improve model fairness. It supports various fairness measures like equal opportunity and demographic parity, and achieves comparable or superior performance against state-of-the-art methods on synthetic and real datasets.
Training a fair machine learning model is essential to prevent demographic disparity. Existing techniques for improving model fairness require broad changes in either data preprocessing or model training, rendering themselves difficult-to-adopt for potentially already complex machine learning systems. We address this problem via the lens of bilevel optimization. While keeping the standard training algorithm as an inner optimizer, we incorporate an outer optimizer so as to equip the inner problem with an additional functionality: Adaptively selecting minibatch sizes for the purpose of improving model fairness. Our batch selection algorithm, which we call FairBatch, implements this optimization and supports prominent fairness measures: equal opportunity, equalized odds, and demographic parity. FairBatch comes with a significant implementation benefit -- it does not require any modification to data preprocessing or model training. For instance, a single-line change of PyTorch code for replacing batch selection part of model training suffices to employ FairBatch. Our experiments conducted both on synthetic and benchmark real data demonstrate that FairBatch can provide such functionalities while achieving comparable (or even greater) performances against the state of the arts. Furthermore, FairBatch can readily improve fairness of any pre-trained model simply via fine-tuning. It is also compatible with existing batch selection techniques intended for different purposes, such as faster convergence, thus gracefully achieving multiple purposes.