AICLCYOct 23, 2023

Moral Foundations of Large Language Models

arXiv:2310.15337v198 citationsh-index: 7
Originality Synthesis-oriented
AI Analysis

This work highlights potential risks of moral biases in LLMs, which could lead to unintended consequences in AI applications, though it is incremental as it applies an existing psychological framework to AI.

The paper analyzed large language models (LLMs) using moral foundations theory to assess biases in moral values, finding that LLMs exhibit specific moral foundations related to human values and political affiliations, with biases that can be manipulated through adversarial prompting to affect downstream tasks.

Moral foundations theory (MFT) is a psychological assessment tool that decomposes human moral reasoning into five factors, including care/harm, liberty/oppression, and sanctity/degradation (Graham et al., 2009). People vary in the weight they place on these dimensions when making moral decisions, in part due to their cultural upbringing and political ideology. As large language models (LLMs) are trained on datasets collected from the internet, they may reflect the biases that are present in such corpora. This paper uses MFT as a lens to analyze whether popular LLMs have acquired a bias towards a particular set of moral values. We analyze known LLMs and find they exhibit particular moral foundations, and show how these relate to human moral foundations and political affiliations. We also measure the consistency of these biases, or whether they vary strongly depending on the context of how the model is prompted. Finally, we show that we can adversarially select prompts that encourage the moral to exhibit a particular set of moral foundations, and that this can affect the model's behavior on downstream tasks. These findings help illustrate the potential risks and unintended consequences of LLMs assuming a particular moral stance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes