TowerDebias: A Novel Unfairness Removal Method Based on the Tower Property
This addresses fairness concerns in decision-making processes using black-box models, which is critical for legal and ethical implications in commercial applications, though it is an incremental improvement as a post-processing technique.
The paper tackles the problem of unfair predictions from black-box machine learning models by proposing TowerDebias (tDB), a post-processing method that reduces the influence of sensitive attributes like race or gender, leveraging the Tower Property from probability theory without requiring model retraining.
Decision-making processes have increasingly come to rely on sophisticated machine learning tools, raising critical concerns about the fairness of their predictions with respect to sensitive groups. The widespread adoption of commercial "black-box" models necessitates careful consideration of their legal and ethical implications for consumers. When users interact with such black-box models, a key challenge arises: how can the influence of sensitive attributes, such as race or gender, be mitigated or removed from its predictions? We propose towerDebias (tDB), a novel post-processing method designed to reduce the influence of sensitive attributes in predictions made by black-box models. Our tDB approach leverages the Tower Property from probability theory to improve prediction fairness without requiring retraining of the original model. This method is highly versatile, as it requires no prior knowledge of the original algorithm's internal structure and is adaptable to a diverse range of applications. We present a formal fairness improvement theorem for tDB and showcase its effectiveness in both regression and classification tasks using multiple real-world datasets.