LG AINov 29, 2022

The Effectiveness of World Models for Continual Reinforcement Learning

Samuel Kessler, Mateusz Ostaszewski, Michał Bortkiewicz, Mateusz Żarski, Maciej Wołczyk, Jack Parker-Holder, Stephen J. Roberts, Piotr Miłoś

Oxford

arXiv:2211.15944v214.119 citationsh-index: 11Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of continual learning for reinforcement learning agents in dynamic settings, though it appears incremental as it builds on existing world model techniques.

The paper tackled the problem of adapting world models for continual reinforcement learning in changing environments, resulting in Continual-Dreamer, which outperforms state-of-the-art task-agnostic methods on Minigrid and Minihack benchmarks with improved sample efficiency.

World models power some of the most efficient reinforcement learning algorithms. In this work, we showcase that they can be harnessed for continual learning - a situation when the agent faces changing environments. World models typically employ a replay buffer for training, which can be naturally extended to continual learning. We systematically study how different selective experience replay methods affect performance, forgetting, and transfer. We also provide recommendations regarding various modeling options for using world models. The best set of choices is called Continual-Dreamer, it is task-agnostic and utilizes the world model for continual exploration. Continual-Dreamer is sample efficient and outperforms state-of-the-art task-agnostic continual reinforcement learning methods on Minigrid and Minihack benchmarks.

View on arXiv PDF Code

Similar