CY AINov 19, 2024

The Moral Mind(s) of Large Language Models

arXiv:2412.04476v35 citationsh-index: 7

AI Analysis

This provides a new empirical framework for evaluating moral consistency and ethical alignment in AI systems, addressing concerns for stakeholders in AI ethics and safety, though it is incremental in applying existing economic tools to LLMs.

The study investigated whether large language models (LLMs) exhibit consistent moral preferences by applying revealed preference theory to nearly 40 models across ethical dilemmas, finding that at least one model from each major provider acted as if guided by stable moral preferences, with most clustering around neutral stances and showing both shared reasoning and meaningful variation.

As large language models (LLMs) increasingly participate in tasks with ethical and societal stakes, a critical question arises: do they exhibit an emergent "moral mind" - a consistent structure of moral preferences guiding their decisions - and to what extent is this structure shared across models? To investigate this, we applied tools from revealed preference theory to nearly 40 leading LLMs, presenting each with many structured moral dilemmas spanning five foundational dimensions of ethical reasoning. Using a probabilistic rationality test, we found that at least one model from each major provider exhibited behavior consistent with approximately stable moral preferences, acting as if guided by an underlying utility function. We then estimated these utility functions and found that most models cluster around neutral moral stances. To further characterize heterogeneity, we employed a non-parametric permutation approach, constructing a probabilistic similarity network based on revealed preference patterns. The results reveal a shared core in LLMs' moral reasoning, but also meaningful variation: some models show flexible reasoning across perspectives, while others adhere to more rigid ethical profiles. These findings provide a new empirical lens for evaluating moral consistency in LLMs and offer a framework for benchmarking ethical alignment across AI systems.

View on arXiv PDF

Similar