A Differentiable Distance Approximation for Fairer Image Classification
This work addresses fairness in image classification, offering a more efficient and stable method compared to existing solutions, though it appears incremental as it builds on prior fairness metrics and optimization techniques.
The paper tackles the problem of bias in AI models, particularly regarding protected attributes like ethnicity, age, or gender, by proposing a differentiable approximation of demographic variance to measure and improve fairness during training, resulting in enhanced fairness while maintaining high classification accuracy across varied tasks and datasets.
Naively trained AI models can be heavily biased. This can be particularly problematic when the biases involve legally or morally protected attributes such as ethnic background, age or gender. Existing solutions to this problem come at the cost of extra computation, unstable adversarial optimisation or have losses on the feature space structure that are disconnected from fairness measures and only loosely generalise to fairness. In this work we propose a differentiable approximation of the variance of demographics, a metric that can be used to measure the bias, or unfairness, in an AI model. Our approximation can be optimised alongside the regular training objective which eliminates the need for any extra models during training and directly improves the fairness of the regularised models. We demonstrate that our approach improves the fairness of AI models in varied task and dataset scenarios, whilst still maintaining a high level of classification accuracy. Code is available at https://bitbucket.org/nelliottrosa/base_fairness.