CLApr 23

Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions

Jiseon Kim, Jea Kwon, Luiz Felipe Vecchietti, Wenchao Dong, Jaehong Kim, Meeyoung Cha

arXiv:2604.2187145.0

AI Analysis

For developers and users of LLMs in decision-support systems, this work highlights a critical gap between models' prescriptive moral reasoning and their social sensitivity, which may lead to misalignments in nuanced social contexts.

The study evaluates LLMs on the Whistleblower's Dilemma, finding that while models' moral rightness judgments are fairness-oriented, their predicted human behavior shifts toward loyalty with relational closeness, yet their autonomous decisions align with moral rightness, revealing a misalignment that could cause issues in real-world deployment.

Human moral judgment is context-dependent and modulated by interpersonal relationships. As large language models (LLMs) increasingly function as decision-support systems, determining whether they encode these social nuances is critical. We characterize machine behavior using the Whistleblower's Dilemma by varying two experimental dimensions: crime severity and relational closeness. Our study evaluates three distinct perspectives: (1) moral rightness (prescriptive norms), (2) predicted human behavior (descriptive social expectations), and (3) autonomous model decision-making. By analyzing the reasoning processes, we identify a clear cross-perspective divergence: while moral rightness remains consistently fairness-oriented, predicted human behavior shifts significantly toward loyalty as relational closeness increases. Crucially, model decisions align with moral rightness judgments rather than their own behavioral predictions. This inconsistency suggests that LLM decision-making prioritizes rigid, prescriptive rules over the social sensitivity present in their internal world-modeling, which poses a gap that may lead to significant misalignments in real-world deployments.

View on arXiv PDF

Similar