CY AIApr 8, 2025

From Stability to Inconsistency: A Study of Moral Preferences in LLMs

Monika Jotautaite, Mary Phuong, Chatrik Singh Mangat, Maria Angelica Martinez

arXiv:2504.06324v12.31 citationsh-index: 9

Originality Incremental advance

AI Analysis

This addresses the issue of moral inconsistency in LLMs for users and developers, but it is incremental as it builds on existing theory and methods.

The study tackled the problem of understanding moral biases in large language models (LLMs) by introducing a dataset based on Moral Foundations Theory and a novel evaluation method, revealing that state-of-the-art models have homogeneous value preferences but lack consistency.

As large language models (LLMs) increasingly integrate into our daily lives, it becomes crucial to understand their implicit biases and moral tendencies. To address this, we introduce a Moral Foundations LLM dataset (MFD-LLM) grounded in Moral Foundations Theory, which conceptualizes human morality through six core foundations. We propose a novel evaluation method that captures the full spectrum of LLMs' revealed moral preferences by answering a range of real-world moral dilemmas. Our findings reveal that state-of-the-art models have remarkably homogeneous value preferences, yet demonstrate a lack of consistency.

View on arXiv PDF

Similar