LG AIOct 25, 2022

In-context Reinforcement Learning with Algorithm Distillation

Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan Brooks, Maxime Gazeau, Himanshu Sahni

arXiv:2210.14215v137.8202 citationsh-index: 32

Originality Highly original

AI Analysis

This work addresses the challenge of improving reinforcement learning efficiency for researchers and practitioners, though it appears incremental as it builds on prior sequence modeling and distillation methods.

The paper tackles the problem of distilling reinforcement learning algorithms into neural networks by modeling training histories with a causal transformer, enabling in-context policy improvement without network updates. It demonstrates that Algorithm Distillation can learn more data-efficient RL algorithms in environments with sparse rewards and pixel-based observations.

We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transformer is trained by autoregressively predicting actions given their preceding learning histories as context. Unlike sequential policy prediction architectures that distill post-learning or expert sequences, AD is able to improve its policy entirely in-context without updating its network parameters. We demonstrate that AD can reinforcement learn in-context in a variety of environments with sparse rewards, combinatorial task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data.

View on arXiv PDF

Similar