CLFeb 16, 2024

I Am Not Them: Fluid Identities and Persistent Out-group Bias in Large Language Models

arXiv:2402.10436v15 citationsh-index: 9
Originality Synthesis-oriented
AI Analysis

This research addresses bias mitigation in AI systems, revealing persistent out-group biases in LLMs that could perpetuate discrimination, though it is incremental as it builds on prior findings about biases in language models.

The study investigated cultural biases in ChatGPT across Western and Eastern languages, finding that when assigned specific social identities, the model exhibits stronger negative biases toward out-group values than positive biases toward in-group values, with results replicated in the political domain revealing an intrinsic Democratic bias in LLMs.

We explored cultural biases-individualism vs. collectivism-in ChatGPT across three Western languages (i.e., English, German, and French) and three Eastern languages (i.e., Chinese, Japanese, and Korean). When ChatGPT adopted an individualistic persona in Western languages, its collectivism scores (i.e., out-group values) exhibited a more negative trend, surpassing their positive orientation towards individualism (i.e., in-group values). Conversely, when a collectivistic persona was assigned to ChatGPT in Eastern languages, a similar pattern emerged with more negative responses toward individualism (i.e., out-group values) as compared to collectivism (i.e., in-group values). The results indicate that when imbued with a particular social identity, ChatGPT discerns in-group and out-group, embracing in-group values while eschewing out-group values. Notably, the negativity towards the out-group, from which prejudices and discrimination arise, exceeded the positivity towards the in-group. The experiment was replicated in the political domain, and the results remained consistent. Furthermore, this replication unveiled an intrinsic Democratic bias in Large Language Models (LLMs), aligning with earlier findings and providing integral insights into mitigating such bias through prompt engineering. Extensive robustness checks were performed using varying hyperparameter and persona setup methods, with or without social identity labels, across other popular language models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes