CLCYNov 4, 2023

LLMs grasp morality in concept

arXiv:2311.02294v12 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses a foundational issue in AI ethics for researchers and practitioners, but it is incremental as it builds on existing philosophical and ethical debates.

The paper tackles the problem of how large language models (LLMs) can 'mean' anything, arguing that without addressing this, imbuing LLMs with values like fairness is unclear. It proposes a general theory of meaning to show that LLMs already grasp human societal concepts like morality, suggesting current alignment methods may be limited or counterproductive.

Work in AI ethics and fairness has made much progress in regulating LLMs to reflect certain values, such as fairness, truth, and diversity. However, it has taken the problem of how LLMs might 'mean' anything at all for granted. Without addressing this, it is not clear what imbuing LLMs with such values even means. In response, we provide a general theory of meaning that extends beyond humans. We use this theory to explicate the precise nature of LLMs as meaning-agents. We suggest that the LLM, by virtue of its position as a meaning-agent, already grasps the constructions of human society (e.g. morality, gender, and race) in concept. Consequently, under certain ethical frameworks, currently popular methods for model alignment are limited at best and counterproductive at worst. Moreover, unaligned models may help us better develop our moral and social philosophy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes