ROAILGOct 16, 2023

Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance

arXiv:2310.10021v293 citationsh-index: 30
Originality Incremental advance
AI Analysis

This addresses the challenge of autonomous skill acquisition in reinforcement learning for complex tasks, reducing reliance on demonstrations or rewards, though it is incremental by building on existing bootstrapping and LLM guidance techniques.

The paper tackles the problem of learning new long-horizon tasks without expert supervision by proposing BOSS, which uses large language models to guide skill bootstrapping from primitive skills, resulting in agents that outperform prior methods on zero-shot execution of unseen tasks in realistic household environments.

We propose BOSS, an approach that automatically learns to solve new long-horizon, complex, and meaningful tasks by growing a learned skill library with minimal supervision. Prior work in reinforcement learning require expert supervision, in the form of demonstrations or rich reward functions, to learn long-horizon tasks. Instead, our approach BOSS (BOotStrapping your own Skills) learns to accomplish new tasks by performing "skill bootstrapping," where an agent with a set of primitive skills interacts with the environment to practice new skills without receiving reward feedback for tasks outside of the initial skill set. This bootstrapping phase is guided by large language models (LLMs) that inform the agent of meaningful skills to chain together. Through this process, BOSS builds a wide range of complex and useful behaviors from a basic set of primitive skills. We demonstrate through experiments in realistic household environments that agents trained with our LLM-guided bootstrapping procedure outperform those trained with naive bootstrapping as well as prior unsupervised skill acquisition methods on zero-shot execution of unseen, long-horizon tasks in new environments. Website at clvrai.com/boss.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes