CLDec 27, 2024

Right vs. Right: Can LLMs Make Tough Choices?

arXiv:2412.19926v16 citationsh-index: 16
Originality Incremental advance
AI Analysis

This work addresses the problem of understanding and improving LLMs' ethical decision-making for AI safety and alignment, though it is incremental as it builds on existing ethical frameworks and datasets.

The study evaluated how 20 large language models (LLMs) navigate ethical dilemmas, revealing that they exhibit pronounced preferences between conflicting moral values, such as prioritizing truth over loyalty, and that larger models tend to support a deontological perspective, maintaining choices even when negative consequences are specified.

An ethical dilemma describes a choice between two "right" options involving conflicting moral values. We present a comprehensive evaluation of how LLMs navigate ethical dilemmas. Specifically, we investigate LLMs on their (1) sensitivity in comprehending ethical dilemmas, (2) consistency in moral value choice, (3) consideration of consequences, and (4) ability to align their responses to a moral value preference explicitly or implicitly specified in a prompt. Drawing inspiration from a leading ethical framework, we construct a dataset comprising 1,730 ethical dilemmas involving four pairs of conflicting values. We evaluate 20 well-known LLMs from six families. Our experiments reveal that: (1) LLMs exhibit pronounced preferences between major value pairs, and prioritize truth over loyalty, community over individual, and long-term over short-term considerations. (2) The larger LLMs tend to support a deontological perspective, maintaining their choices of actions even when negative consequences are specified. (3) Explicit guidelines are more effective in guiding LLMs' moral choice than in-context examples. Lastly, our experiments highlight the limitation of LLMs in comprehending different formulations of ethical dilemmas.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes