CLCYLGFeb 24, 2023

Fairness in Language Models Beyond English: Gaps and Challenges

arXiv:2302.12578v2288 citationsh-index: 39
Originality Synthesis-oriented
AI Analysis

This addresses the issue of fairness gaps in AI for non-English speakers, but it is incremental as it surveys existing challenges without proposing new solutions.

The paper tackles the problem of fairness in language models for non-English languages, highlighting that current research is concentrated on English and lacks comprehensive coverage for diverse cultures and languages, with the result being that existing dataset-driven methods are insufficient for scaling fairness efforts globally.

With language models becoming increasingly ubiquitous, it has become essential to address their inequitable treatment of diverse demographic groups and factors. Most research on evaluating and mitigating fairness harms has been concentrated on English, while multilingual models and non-English languages have received comparatively little attention. This paper presents a survey of fairness in multilingual and non-English contexts, highlighting the shortcomings of current research and the difficulties faced by methods designed for English. We contend that the multitude of diverse cultures and languages across the world makes it infeasible to achieve comprehensive coverage in terms of constructing fairness datasets. Thus, the measurement and mitigation of biases must evolve beyond the current dataset-driven practices that are narrowly focused on specific dimensions and types of biases and, therefore, impossible to scale across languages and cultures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes