LG CL ROJun 9, 2024

LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning

Utsav Singh, Pramit Bhattacharyya, Vinay P. Namboodiri

arXiv:2406.05881v66.43 citations

Originality Highly original

AI Analysis

This addresses the problem of non-stationarity in hierarchical reinforcement learning for robotics, enabling more stable and efficient learning in sparse reward environments.

The paper tackles the challenge of translating natural language instructions into robotic control policies under sparse rewards by introducing LGR2, a hierarchical reinforcement learning framework that uses LLMs to generate language-guided rewards, achieving over 55% success rates on tasks and robust real-world transfer.

Large language models (LLMs) have shown remarkable abilities in logical reasoning, in-context learning, and code generation. However, translating natural language instructions into effective robotic control policies remains a significant challenge, especially for tasks requiring long-horizon planning and operating under sparse reward conditions. Hierarchical Reinforcement Learning (HRL) provides a natural framework to address this challenge in robotics; however, it typically suffers from non-stationarity caused by the changing behavior of the lower-level policy during training, destabilizing higher-level policy learning. We introduce LGR2, a novel HRL framework that leverages LLMs to generate language-guided reward functions for the higher-level policy. By decoupling high-level reward generation from low-level policy changes, LGR2 fundamentally mitigates the non-stationarity problem in off-policy HRL, enabling stable and efficient learning. To further enhance sample efficiency in sparse environments, we integrate goal-conditioned hindsight experience relabeling. Extensive experiments across simulated and real-world robotic navigation and manipulation tasks demonstrate LGR2 outperforms both hierarchical and non-hierarchical baselines, achieving over 55% success rates on challenging tasks and robust transfer to real robots, without additional fine-tuning.

View on arXiv PDF

Similar