HCMar 24

From Morality Installation in LLMs to LLMs in Morality-as-a-System

arXiv:2603.2294443.4h-index: 1

Predicted impact top 43% in HC · last 90 daysOriginality Synthesis-oriented

AI Analysis

This conceptual framework addresses the interpretability-governance gap, cross-cultural plurality, and lifecycle monitoring for AI ethics and governance specialists, though it is incremental as it builds on existing methods like constitutional AI and RLHF.

The paper tackles the problem of static morality in large language models by proposing a morality-as-a-system framework, which reframes moral behavior as a dynamic, emergent property of sociotechnical systems rather than a fixed installation at training time.

Work on morality in large language models (LLMs) has progressed via constitutional AI, reinforcement learning from human feedback (RLHF) and systematic benchmarking, yet it still lacks tools to connect internal moral representations to regulatory obligations, to design cultural plurality across the full development stack, and to monitor how moral properties drift over the lifecycle of a deployed system. These difficulties reflect a shared root. Morality is installed in a model at training time. I propose instead a morality-as-a-system framework, grounded in Niklas Luhmann's social systems theory, that treats LLM morality as a dynamic, emergent property of a sociotechnical system. Moral behaviour in a deployed LLM is not fixed at training. It is continuously reproduced through interactions among seven structurally coupled components spanning the neural substrate, training data, alignment procedures, system prompts, moderation, runtime dynamics, and user interface. This is a conceptual framework paper, not an empirical study. It philosophically reframes three known challenges, the interpretability-governance gap, the cross-component plurality problem, and the absence of lifecycle monitoring, as structural coupling failures that the installation paradigm cannot diagnose. For technical researchers, it explores three illustrative hypotheses about cross-component representational inconsistency, representation-level drift as an early safety signal, and the governance advantage of lifecycle monitoring. For philosophers and governance specialists, it offers a vocabulary for specifying substrate-level monitoring obligations within existing governance frameworks. The morality-as-a-system framework does not displace elements such as constitutional AI or RLHF it embeds them within a larger temporal and structural account and specifies the additional infrastructure those methods require.

View on arXiv PDF

Similar