CL AI CYApr 16, 2024

White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs

arXiv:2404.10508v54.21 citationsh-index: 9

Originality Incremental advance

AI Analysis

This work addresses social biases in LLMs, which can perpetuate stereotypes in applications like biographies and reviews, though it is incremental as it builds on existing bias research with a new focus on language agency.

The paper tackles the problem of social biases in language agency within LLM-generated content by introducing the LABE benchmark, which reveals that LLMs exhibit greater gender bias than human texts and higher intersectional bias, and proposes the MSR mitigation strategy, which reduces bias more effectively than prompt-based methods.

Social biases can manifest in language agency. However, very limited research has investigated such biases in Large Language Model (LLM)-generated content. In addition, previous works often rely on string-matching techniques to identify agentic and communal words within texts, falling short of accurately classifying language agency. We introduce the Language Agency Bias Evaluation (LABE) benchmark, which comprehensively evaluates biases in LLMs by analyzing agency levels attributed to different demographic groups in model generations. LABE tests for gender, racial, and intersectional language agency biases in LLMs on 3 text generation tasks: biographies, professor reviews, and reference letters. Using LABE, we unveil language agency social biases in 3 recent LLMs: ChatGPT, Llama3, and Mistral. We observe that: (1) LLM generations tend to demonstrate greater gender bias than human-written texts; (2) Models demonstrate remarkably higher levels of intersectional bias than the other bias aspects. (3) Prompt-based mitigation is unstable and frequently leads to bias exacerbation. Based on our observations, we propose Mitigation via Selective Rewrite (MSR), a novel bias mitigation strategy that leverages an agency classifier to identify and selectively revise parts of generated texts that demonstrate communal traits. Empirical results prove MSR to be more effective and reliable than prompt-based mitigation method, showing a promising research direction.

View on arXiv PDF

Similar