CLFeb 21, 2024

Beyond Hate Speech: NLP's Challenges and Opportunities in Uncovering Dehumanizing Language

Hamidreza Saffari, Mohammadamin Shafiei, Hezhao Zhang, Lasana Harris, Nafise Sadat Moosavi

arXiv:2402.13818v22.73 citationsh-index: 16

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of identifying subtle dehumanizing language for marginalized communities, but it is incremental as it evaluates existing models on a new task without proposing a novel method.

The paper tackled the problem of detecting dehumanizing language, a harmful form of hate speech, by evaluating four large language models (Claude, GPT, Mistral, Qwen), finding that only Claude achieved strong performance (over 80% F1) under optimized settings, while others performed moderately and showed disparities across target groups.

Dehumanization, i.e., denying human qualities to individuals or groups, is a particularly harmful form of hate speech that can normalize violence against marginalized communities. Despite advances in NLP for detecting general hate speech, approaches to identifying dehumanizing language remain limited due to scarce annotated data and the subtle nature of such expressions. In this work, we systematically evaluate four state-of-the-art large language models (LLMs) - Claude, GPT, Mistral, and Qwen - for dehumanization detection. Our results show that only one model-Claude-achieves strong performance (over 80% F1) under an optimized configuration, while others, despite their capabilities, perform only moderately. Performance drops further when distinguishing dehumanization from related hate types such as derogation. We also identify systematic disparities across target groups: models tend to over-predict dehumanization for some identities (e.g., Gay men), while under-identifying it for others (e.g., Refugees). These findings motivate the need for systematic, group-level evaluation when applying pretrained language models to dehumanization detection tasks.

View on arXiv PDF

Similar