Closing the Gap in the Trade-off between Fair Representations and Accuracy
This work addresses fairness issues in machine learning models for applications relying on language representations, but it appears incremental as it builds on existing bias detection and mitigation techniques.
The paper tackles bias in natural language representations by identifying embedding-level bias through differences in reconstruction errors along principal components, and recommends mitigation methods that maintain decent classification accuracy.
The rapid developments of various machine learning models and their deployments in several applications has led to discussions around the importance of looking beyond the accuracies of these models. Fairness of such models is one such aspect that is deservedly gaining more attention. In this work, we analyse the natural language representations of documents and sentences (i.e., encodings) for any embedding-level bias that could potentially also affect the fairness of the downstream tasks that rely on them. We identify bias in these encodings either towards or against different sub-groups based on the difference in their reconstruction errors along various subsets of principal components. We explore and recommend ways to mitigate such bias in the encodings while also maintaining a decent accuracy in classification models that use them.