Achievable Fairness on Your Data With Utility Guarantees
It addresses the problem of dataset-specific fairness requirements for practitioners, offering a robust auditing framework, though it is incremental as it builds on existing frameworks like YOTO.
The paper tackles the fairness-accuracy trade-off in machine learning by developing a computationally efficient method to approximate dataset-specific trade-off curves with statistical guarantees, showing it reliably quantifies optimal trade-offs and detects suboptimality in state-of-the-art fairness methods across tabular, image, and language datasets.
In machine learning fairness, training models that minimize disparity across different sensitive groups often leads to diminished accuracy, a phenomenon known as the fairness-accuracy trade-off. The severity of this trade-off inherently depends on dataset characteristics such as dataset imbalances or biases and therefore, using a uniform fairness requirement across diverse datasets remains questionable. To address this, we present a computationally efficient approach to approximate the fairness-accuracy trade-off curve tailored to individual datasets, backed by rigorous statistical guarantees. By utilizing the You-Only-Train-Once (YOTO) framework, our approach mitigates the computational burden of having to train multiple models when approximating the trade-off curve. Crucially, we introduce a novel methodology for quantifying uncertainty in our estimates, thereby providing practitioners with a robust framework for auditing model fairness while avoiding false conclusions due to estimation errors. Our experiments spanning tabular (e.g., Adult), image (CelebA), and language (Jigsaw) datasets underscore that our approach not only reliably quantifies the optimum achievable trade-offs across various data modalities but also helps detect suboptimality in SOTA fairness methods.