CLAIJun 25, 2025

Uncovering Hidden Violent Tendencies in LLMs: A Demographic Analysis via Behavioral Vignettes

arXiv:2506.20822v11 citationsh-index: 1
Originality Synthesis-oriented
AI Analysis

This work addresses potential biases in LLMs for detecting violent content, which is crucial for online safety applications, though it is incremental as it applies existing methods to new data.

The study evaluated six LLMs using a social science instrument to measure violent tendencies in morally ambiguous scenarios, revealing that their text generation often diverges from internal violent preferences and shows demographic biases contradicting established research.

Large language models (LLMs) are increasingly proposed for detecting and responding to violent content online, yet their ability to reason about morally ambiguous, real-world scenarios remains underexamined. We present the first study to evaluate LLMs using a validated social science instrument designed to measure human response to everyday conflict, namely the Violent Behavior Vignette Questionnaire (VBVQ). To assess potential bias, we introduce persona-based prompting that varies race, age, and geographic identity within the United States. Six LLMs developed across different geopolitical and organizational contexts are evaluated under a unified zero-shot setting. Our study reveals two key findings: (1) LLMs surface-level text generation often diverges from their internal preference for violent responses; (2) their violent tendencies vary across demographics, frequently contradicting established findings in criminology, social science, and psychology.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes