CLCYLGJun 29, 2025

Datasets for Fairness in Language Models: An In-Depth Survey

arXiv:2506.23411v27 citationsh-index: 17Has Code
Originality Synthesis-oriented
AI Analysis

It addresses a critical gap for researchers and practitioners in AI fairness by highlighting overlooked biases in evaluation practices, though it is incremental as a survey and framework proposal.

This survey tackles the problem of underexamined fairness datasets in language model evaluation by analyzing 16 popular datasets, revealing biases that distort fairness conclusions and proposing a unified framework to improve their use.

Despite the growing reliance on fairness benchmarks to evaluate language models, the datasets that underpin these benchmarks remain critically underexamined. This survey addresses that overlooked foundation by offering a comprehensive analysis of the most widely used fairness datasets in language model research. To ground this analysis, we characterize each dataset across key dimensions, including provenance, demographic scope, annotation design, and intended use, revealing the assumptions and limitations baked into current evaluation practices. Building on this foundation, we propose a unified evaluation framework that surfaces consistent patterns of demographic disparities across benchmarks and scoring metrics. Applying this framework to sixteen popular datasets, we uncover overlooked biases that may distort conclusions about model fairness and offer guidance on selecting, combining, and interpreting these resources more effectively and responsibly. Our findings highlight an urgent need for new benchmarks that capture a broader range of social contexts and fairness notions. To support future research, we release all data, code, and results at https://github.com/vanbanTruong/Fairness-in-Large-Language-Models/tree/main/datasets, fostering transparency and reproducibility in the evaluation of language model fairness.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes