CLSep 5, 2024

Persona Setting Pitfall: Persistent Outgroup Biases in Large Language Models Arising from Social Identity Adoption

arXiv:2409.03843v16 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses biases in AI systems that can perpetuate discrimination, offering a method to improve fairness, though it is incremental as it builds on existing social identity theory.

The study tackled the problem of outgroup biases in large language models (LLMs) arising from social identity adoption, showing that outgroup bias is as strong as ingroup favoritism and that guiding LLMs to adopt disfavored group perspectives mitigated biases like pro-liberal, anti-conservative and gender biases.

Drawing parallels between human cognition and artificial intelligence, we explored how large language models (LLMs) internalize identities imposed by targeted prompts. Informed by Social Identity Theory, these identity assignments lead LLMs to distinguish between "we" (the ingroup) and "they" (the outgroup). This self-categorization generates both ingroup favoritism and outgroup bias. Nonetheless, existing literature has predominantly focused on ingroup favoritism, often overlooking outgroup bias, which is a fundamental source of intergroup prejudice and discrimination. Our experiment addresses this gap by demonstrating that outgroup bias manifests as strongly as ingroup favoritism. Furthermore, we successfully mitigated the inherent pro-liberal, anti-conservative bias in LLMs by guiding them to adopt the perspectives of the initially disfavored group. These results were replicated in the context of gender bias. Our findings highlight the potential to develop more equitable and balanced language models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes