From Descriptive to Prescriptive: Uncover the Social Value Alignment of LLM-based Agents
For developers of LLM-based agents, this work addresses the problem of aligning agents with human social values in dilemma decision-making, though the gains are incremental over existing prompting methods.
The paper proposes a value-based framework using GraphRAG to convert principles into instructions for LLM-based agents, improving alignment with human social values. On the DAILYDILEMMAS benchmark, the method outperforms prompt-based baselines like ECoT, Plan-and-Solve, and Metacognitive prompting.
Wide applications of LLM-based agents require strong alignment with human social values. However, current works still exhibit deficiencies in self-cognition and dilemma decision, as well as self-emotions. To remedy this, we propose a novel value-based framework that employs GraphRAG to convert principles into value-based instructions and steer the agent to behave as expected by retrieving the suitable instruction upon a specific conversation context. To evaluate the ratio of expected behaviors, we define the expected behaviors from two famous theories, Maslow's Hierarchy of Needs and Plutchik's Wheel of Emotion. By experimenting with our method on the benchmark of DAILYDILEMMAS, our method exhibits significant performance gains compared to prompt-based baselines, including ECoT, Plan-and-Solve, and Metacognitive prompting. Our method provides a basis for the emergence of self-emotion in AI systems.