LG AI CLFeb 13, 2023

Guiding Pretraining in Reinforcement Learning with Large Language Models

Yuqing Du, Olivia Watkins, Zihan Wang, Cédric Colas, Trevor Darrell, Pieter Abbeel, Abhishek Gupta, Jacob Andreas

MicrosoftMIT

arXiv:2302.06692v240.6269 citationsh-index: 164Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of inefficient exploration in large environments for reinforcement learning practitioners, though it is an incremental improvement over existing intrinsic motivation methods.

The paper tackles the problem of sparse rewards in reinforcement learning by using a language model to suggest goals for exploration, resulting in agents with better coverage of common-sense behaviors and improved performance on downstream tasks in environments like Crafter and Housekeep.

Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function. Intrinsically motivated exploration methods address this limitation by rewarding agents for visiting novel states or transitions, but these methods offer limited benefits in large environments where most discovered novelty is irrelevant for downstream tasks. We describe a method that uses background knowledge from text corpora to shape exploration. This method, called ELLM (Exploring with LLMs) rewards an agent for achieving goals suggested by a language model prompted with a description of the agent's current state. By leveraging large-scale language model pretraining, ELLM guides agents toward human-meaningful and plausibly useful behaviors without requiring a human in the loop. We evaluate ELLM in the Crafter game environment and the Housekeep robotic simulator, showing that ELLM-trained agents have better coverage of common-sense behaviors during pretraining and usually match or improve performance on a range of downstream tasks. Code available at https://github.com/yuqingd/ellm.

View on arXiv PDF Code

Similar