ROCLLGFeb 23, 2024

PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning

arXiv:2402.15420v119 citationsh-index: 7HRI
AI Analysis

This work addresses the problem of reducing human query burden in robot learning, but it is incremental as it builds on existing preference-based RL methods by incorporating text and LLMs.

The paper tackled the sample-efficiency challenge in preference-based reinforcement learning for robots by expanding human queries to include preferences and optional text prompting, leveraging a large language model for zero-shot reasoning, and achieved effectiveness in simulated scenarios and a user study with socially compliant trajectories in social navigation.

Preference-based reinforcement learning (RL) has emerged as a new field in robot learning, where humans play a pivotal role in shaping robot behavior by expressing preferences on different sequences of state-action pairs. However, formulating realistic policies for robots demands responses from humans to an extensive array of queries. In this work, we approach the sample-efficiency challenge by expanding the information collected per query to contain both preferences and optional text prompting. To accomplish this, we leverage the zero-shot capabilities of a large language model (LLM) to reason from the text provided by humans. To accommodate the additional query information, we reformulate the reward learning objectives to contain flexible highlights -- state-action pairs that contain relatively high information and are related to the features processed in a zero-shot fashion from a pretrained LLM. In both a simulated scenario and a user study, we reveal the effectiveness of our work by analyzing the feedback and its implications. Additionally, the collective feedback collected serves to train a robot on socially compliant trajectories in a simulated social navigation landscape. We provide video examples of the trained policies at https://sites.google.com/view/rl-predilect

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes