CLMar 4

The Company You Keep: How LLMs Respond to Dark Triad Traits

Zeyi Lu, Angelica Henestrosa, Pavel Chizhov, Ivan P. Yamshchikov

arXiv:2603.04299v10.6h-index: 6

Originality Incremental advance

AI Analysis

This research addresses the problem of AI-sycophancy in LLMs, which could amplify harmful user behavior, for developers designing safer conversational AI systems.

This study investigates how Large Language Models (LLMs) respond to user prompts expressing Dark Triad traits (Machiavellianism, Narcissism, and Psychopathy). The analysis shows that while all models predominantly exhibit corrective behavior, they also produce reinforcing output in certain cases, with model behavior varying based on the severity of the traits and the sentiment of the response.

Large Language Models (LLMs) often exhibit highly agreeable and reinforcing conversational styles, also known as AI-sycophancy. Although this behavior is encouraged, it may become problematic when interacting with user prompts that reflect negative social tendencies. Such responses risk amplifying harmful behavior rather than mitigating it. In this study, we examine how LLMs respond to user prompts expressing varying degrees of Dark Triad traits (Machiavellianism, Narcissism, and Psychopathy) using a curated dataset. Our analysis reveals differences across models, whereby all models predominantly exhibit corrective behavior, while showing reinforcing output in certain cases. Model behavior also depends on the severity level and differs in the sentiment of the response. Our findings raise implications for designing safer conversational systems that can detect and respond appropriately when users escalate from benign to harmful requests.

View on arXiv PDF

Similar