CLLGJan 14, 2021

Persistent Anti-Muslim Bias in Large Language Models

arXiv:2101.05783v2698 citations
Originality Incremental advance
AI Analysis

It highlights a severe religious bias in AI models, affecting fairness and safety for Muslim users, but is incremental as it builds on known bias studies.

The paper demonstrates that GPT-3 exhibits persistent anti-Muslim bias, linking 'Muslim' to 'terrorist' in 23% of test cases, and shows that using positive adjectives reduces violent completions from 66% to 20%.

It has been observed that large-scale language models capture undesirable societal biases, e.g. relating to race and gender; yet religious bias has been relatively unexplored. We demonstrate that GPT-3, a state-of-the-art contextual language model, captures persistent Muslim-violence bias. We probe GPT-3 in various ways, including prompt completion, analogical reasoning, and story generation, to understand this anti-Muslim bias, demonstrating that it appears consistently and creatively in different uses of the model and that it is severe even compared to biases about other religious groups. For instance, "Muslim" is analogized to "terrorist" in 23% of test cases, while "Jewish" is mapped to "money" in 5% of test cases. We quantify the positive distraction needed to overcome this bias with adversarial text prompts, and find that use of the most positive 6 adjectives reduces violent completions for "Muslims" from 66% to 20%, but which is still higher than for other religious groups.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes