CL CVJan 31, 2025

Token Sampling Uncertainty Does Not Explain Homogeneity Bias in Large Language Models

arXiv:2501.19337v24.91 citationsh-index: 4

Originality Synthesis-oriented

AI Analysis

This addresses a key obstacle in creating equitable language technologies by shifting focus from inference-time fixes to representation learning and training data interventions.

The study investigated whether token-sampling uncertainty drives homogeneity bias in large language models, finding minimal differences across groups, which explains why temperature-based adjustments fail to mitigate this bias.

Homogeneity bias is one form of stereotyping in AI models where certain groups are represented as more similar to each other than other groups. This bias is a major obstacle to creating equitable language technologies. We test whether the bias is driven by systematic differences in token-sampling uncertainty across six large language models. While we observe the presence of homogeneity bias using sentence similarity, we find very little difference in token sampling uncertainty across groups. This finding elucidates why temperature-based sampling adjustments fail to mitigate homogeneity bias. It suggests researchers should prioritize interventions targeting representation learning mechanisms and training corpus composition rather than inference-time output manipulations.

View on arXiv PDF

Similar