FairGridSearch: A Framework to Compare Fairness-Enhancing Models
This work addresses the problem of model selection for fairness in critical decision-making applications, but it is incremental as it builds on existing bias mitigation methods without introducing new ones.
The paper tackles the challenge of selecting optimal fairness-enhancing models in binary classification by proposing FairGridSearch, a framework for comparing models, and finds that metric selection, base estimator choice, and classification threshold significantly impact fairness and accuracy across three datasets.
Machine learning models are increasingly used in critical decision-making applications. However, these models are susceptible to replicating or even amplifying bias present in real-world data. While there are various bias mitigation methods and base estimators in the literature, selecting the optimal model for a specific application remains challenging. This paper focuses on binary classification and proposes FairGridSearch, a novel framework for comparing fairness-enhancing models. FairGridSearch enables experimentation with different model parameter combinations and recommends the best one. The study applies FairGridSearch to three popular datasets (Adult, COMPAS, and German Credit) and analyzes the impacts of metric selection, base estimator choice, and classification threshold on model fairness. The results highlight the significance of selecting appropriate accuracy and fairness metrics for model evaluation. Additionally, different base estimators and classification threshold values affect the effectiveness of bias mitigation methods and fairness stability respectively, but the effects are not consistent across all datasets. Based on these findings, future research on fairness in machine learning should consider a broader range of factors when building fair models, going beyond bias mitigation methods alone.