CLJan 12

Knowing But Not Doing: Convergent Morality and Divergent Action in LLMs

arXiv:2601.07972v1
Originality Incremental advance
AI Analysis

This work addresses the problem of value alignment for AI safety by revealing a knowledge-action gap in LLMs, which is incremental as it builds on existing theories like Schwartz's values.

The study investigated how Large Language Models (LLMs) represent and enact human values in decision-making, finding near-perfect consistency in scenario-based decisions across models (Pearson r ≈ 1.0) but weak correspondence between self-reported and enacted values (r = 0.4 for humans, 0.3 for LLMs), with performance declining up to 6.6% when instructed to hold specific values.

Value alignment is central to the development of safe and socially compatible artificial intelligence. However, how Large Language Models (LLMs) represent and enact human values in real-world decision contexts remains under-explored. We present ValAct-15k, a dataset of 3,000 advice-seeking scenarios derived from Reddit, designed to elicit ten values defined by Schwartz Theory of Basic Human Values. Using both the scenario-based questions and the traditional value questionnaire, we evaluate ten frontier LLMs (five from U.S. companies, five from Chinese ones) and human participants ($n = 55$). We find near-perfect cross-model consistency in scenario-based decisions (Pearson $r \approx 1.0$), contrasting sharply with the broad variability observed among humans ($r \in [-0.79, 0.98]$). Yet, both humans and LLMs show weak correspondence between self-reported and enacted values ($r = 0.4, 0.3$), revealing a systematic knowledge-action gap. When instructed to "hold" a specific value, LLMs' performance declines up to $6.6%$ compared to merely selecting the value, indicating a role-play aversion. These findings suggest that while alignment training yields normative value convergence, it does not eliminate the human-like incoherence between knowing and acting upon values.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes