CLLGJul 4, 2024

Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models

arXiv:2407.03964v15 citationsh-index: 19Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of sample inefficiency in RL for AI researchers, though it is incremental by building on existing LLM-guided methods.

The paper tackles the low sample efficiency problem in reinforcement learning by using large language models to extract general background knowledge from environments, which is then applied as potential functions for reward shaping, achieving significant sample efficiency improvements in Minigrid and Crafter tasks.

Low sample efficiency is an enduring challenge of reinforcement learning (RL). With the advent of versatile large language models (LLMs), recent works impart common-sense knowledge to accelerate policy learning for RL processes. However, we note that such guidance is often tailored for one specific task but loses generalizability. In this paper, we introduce a framework that harnesses LLMs to extract background knowledge of an environment, which contains general understandings of the entire environment, making various downstream RL tasks benefit from one-time knowledge representation. We ground LLMs by feeding a few pre-collected experiences and requesting them to delineate background knowledge of the environment. Afterward, we represent the output knowledge as potential functions for potential-based reward shaping, which has a good property for maintaining policy optimality from task rewards. We instantiate three variants to prompt LLMs for background knowledge, including writing code, annotating preferences, and assigning goals. Our experiments show that these methods achieve significant sample efficiency improvements in a spectrum of downstream tasks from Minigrid and Crafter domains.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes