CL AI HCSep 19, 2025

Evaluating Behavioral Alignment in Conflict Dialogue: A Multi-Dimensional Comparison of LLM Agents and Humans

Deuksin Kwon, Kaleen Shrestha, Bin Han, Elena Hayoung Lee, Gale Lucas

arXiv:2509.16394v112.06 citationsh-index: 39EMNLP

Originality Incremental advance

AI Analysis

It addresses the problem of evaluating LLM-human alignment in socially complex interactions for AI deployment, though it is incremental as it builds on existing personality conditioning methods.

This study assessed how well personality-prompted LLMs align with human behavior in adversarial dispute resolution dialogues, finding that GPT-4.1 aligned best in linguistic style and emotional dynamics, while Claude-3.7-Sonnet performed best in strategic behavior, but significant gaps remained.

Large Language Models (LLMs) are increasingly deployed in socially complex, interaction-driven tasks, yet their ability to mirror human behavior in emotionally and strategically complex contexts remains underexplored. This study assesses the behavioral alignment of personality-prompted LLMs in adversarial dispute resolution by simulating multi-turn conflict dialogues that incorporate negotiation. Each LLM is guided by a matched Five-Factor personality profile to control for individual variation and enhance realism. We evaluate alignment across three dimensions: linguistic style, emotional expression (e.g., anger dynamics), and strategic behavior. GPT-4.1 achieves the closest alignment with humans in linguistic style and emotional dynamics, while Claude-3.7-Sonnet best reflects strategic behavior. Nonetheless, substantial alignment gaps persist. Our findings establish a benchmark for alignment between LLMs and humans in socially complex interactions, underscoring both the promise and the limitations of personality conditioning in dialogue modeling.

View on arXiv PDF

Similar