Do Emotions Influence Moral Judgment in Large Language Models?
It identifies an alignment gap in LLMs where emotions systematically bias moral judgments, which is important for developers of safe and aligned AI systems.
The paper investigates how induced emotions affect moral judgments in LLMs, finding that positive emotions increase moral acceptability and negative emotions decrease it, with effects strong enough to reverse judgments in up to 20% of cases, while humans do not show such systematic shifts.
Large language models have been extensively studied for emotion recognition and moral reasoning as distinct capabilities, yet the extent to which emotions influence moral judgment remains underexplored. In this work, we develop an emotion-induction pipeline that infuses emotion into moral situations and evaluate shifts in moral acceptability across multiple datasets and LLMs. We observe a directional pattern: positive emotions increase moral acceptability and negative emotions decrease it, with effects strong enough to reverse binary moral judgments in up to 20% of cases, and with susceptibility scaling inversely with model capability. Our analysis further reveals that specific emotions can sometimes behave contrary to what their valence would predict (e.g., remorse paradoxically increases acceptability). A complementary human annotation study shows humans do not exhibit these systematic shifts, indicating an alignment gap in current LLMs.