LGAICLFeb 13, 2023

Guiding Pretraining in Reinforcement Learning with Large Language Models

MicrosoftMIT
arXiv:2302.06692v2266 citationsh-index: 164Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of inefficient exploration in large environments for reinforcement learning practitioners, though it is an incremental improvement over existing intrinsic motivation methods.

The paper tackles the problem of sparse rewards in reinforcement learning by using a language model to suggest goals for exploration, resulting in agents with better coverage of common-sense behaviors and improved performance on downstream tasks in environments like Crafter and Housekeep.

Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function. Intrinsically motivated exploration methods address this limitation by rewarding agents for visiting novel states or transitions, but these methods offer limited benefits in large environments where most discovered novelty is irrelevant for downstream tasks. We describe a method that uses background knowledge from text corpora to shape exploration. This method, called ELLM (Exploring with LLMs) rewards an agent for achieving goals suggested by a language model prompted with a description of the agent's current state. By leveraging large-scale language model pretraining, ELLM guides agents toward human-meaningful and plausibly useful behaviors without requiring a human in the loop. We evaluate ELLM in the Crafter game environment and the Housekeep robotic simulator, showing that ELLM-trained agents have better coverage of common-sense behaviors during pretraining and usually match or improve performance on a range of downstream tasks. Code available at https://github.com/yuqingd/ellm.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes