AIJan 23, 2024

Towards Socially and Morally Aware RL agent: Reward Design With LLM

arXiv:2401.12459v25 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the challenge of designing safe and socially-aware RL agents for deployment in real-world scenarios, though it appears incremental by building on existing safe exploration methods with LLM integration.

The paper tackles the problem of aligning RL agents with human values by using LLMs to generate reward functions that incorporate social and moral norms, demonstrating that LLMs can serve as effective direct reward signals based on human feedback evaluations.

When we design and deploy an Reinforcement Learning (RL) agent, reward functions motivates agents to achieve an objective. An incorrect or incomplete specification of the objective can result in behavior that does not align with human values - failing to adhere with social and moral norms that are ambiguous and context dependent, and cause undesired outcomes such as negative side effects and exploration that is unsafe. Previous work have manually defined reward functions to avoid negative side effects, use human oversight for safe exploration, or use foundation models as planning tools. This work studies the ability of leveraging Large Language Models (LLM)' understanding of morality and social norms on safe exploration augmented RL methods. This work evaluates language model's result against human feedbacks and demonstrates language model's capability as direct reward signals.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes