CYAINov 14, 2025

Differences in the Moral Foundations of Large Language Models

arXiv:2511.11790v13 citationsh-index: 1Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the opaque ethical alignment of LLMs in critical domains like politics and education, though it is incremental as it applies an existing psychological framework to new data.

The study analyzed large language models using moral foundations theory to assess their ethical judgments, finding that models differ from each other and from human baselines, with differences increasing as model capabilities grow.

Large language models are increasingly being used in critical domains of politics, business, and education, but the nature of their normative ethical judgment remains opaque. Alignment research has, to date, not sufficiently utilized perspectives and insights from the field of moral psychology to inform training and evaluation of frontier models. I perform a synthetic experiment on a wide range of models from most major model providers using Jonathan Haidt's influential moral foundations theory (MFT) to elicit diverse value judgments from LLMs. Using multiple descriptive statistical approaches, I document the bias and variance of large language model responses relative to a human baseline in the original survey. My results suggest that models rely on different moral foundations from one another and from a nationally representative human baseline, and these differences increase as model capabilities increase. This work seeks to spur further analysis of LLMs using MFT, including finetuning of open-source models, and greater deliberation by policymakers on the importance of moral foundations for LLM alignment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes