CLJan 12

Knowing But Not Doing: Convergent Morality and Divergent Action in LLMs

Jen-tse Huang, Jiantong Qin, Xueli Qiu, Sharon Levy, Michelle R. Kaufman, Mark Dredze

arXiv:2601.07972v11.1

Originality Incremental advance

AI Analysis

This work addresses the problem of value alignment for AI safety by revealing a knowledge-action gap in LLMs, which is incremental as it builds on existing theories like Schwartz's values.

The study investigated how Large Language Models (LLMs) represent and enact human values in decision-making, finding near-perfect consistency in scenario-based decisions across models (Pearson r ≈ 1.0) but weak correspondence between self-reported and enacted values (r = 0.4 for humans, 0.3 for LLMs), with performance declining up to 6.6% when instructed to hold specific values.

Value alignment is central to the development of safe and socially compatible artificial intelligence. However, how Large Language Models (LLMs) represent and enact human values in real-world decision contexts remains under-explored. We present ValAct-15k, a dataset of 3,000 advice-seeking scenarios derived from Reddit, designed to elicit ten values defined by Schwartz Theory of Basic Human Values. Using both the scenario-based questions and the traditional value questionnaire, we evaluate ten frontier LLMs (five from U.S. companies, five from Chinese ones) and human participants ($n = 55$). We find near-perfect cross-model consistency in scenario-based decisions (Pearson $r \approx 1.0$), contrasting sharply with the broad variability observed among humans ($r \in [-0.79, 0.98]$). Yet, both humans and LLMs show weak correspondence between self-reported and enacted values ($r = 0.4, 0.3$), revealing a systematic knowledge-action gap. When instructed to "hold" a specific value, LLMs' performance declines up to $6.6%$ compared to merely selecting the value, indicating a role-play aversion. These findings suggest that while alignment training yields normative value convergence, it does not eliminate the human-like incoherence between knowing and acting upon values.

View on arXiv PDF

Similar