Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment
This work addresses the gap in understanding how LLMs handle moral dilemmas with contextual variations, which is crucial for AI ethics and alignment, though it is incremental in building on existing moral psychology research.
The study tackled the problem of LLMs' moral judgment lacking context sensitivity by introducing the Contextual MoralChoice dataset with systematic variations, finding that nearly all 22 evaluated models shifted toward rule-violating behavior in different contexts and that alignment with human judgments in base cases did not ensure alignment in contextual sensitivity.
A human's moral decision depends heavily on the context. Yet research on LLM morality has largely studied fixed scenarios. We address this gap by introducing Contextual MoralChoice, a dataset of moral dilemmas with systematic contextual variations known from moral psychology to shift human judgment: consequentialist, emotional, and relational. Evaluating 22 LLMs, we find that nearly all models are context-sensitive, shifting their judgments toward rule-violating behavior. Comparing with a human survey, we find that models and humans are most triggered by different contextual variations, and that a model aligned with human judgments in the base case is not necessarily aligned in its contextual sensitivity. This raises the question of controlling contextual sensitivity, which we address with an activation steering approach that can reliably increase or decrease a model's contextual sensitivity.