LGMLMar 6, 2019

Continual Learning Using World Models for Pseudo-Rehearsal

arXiv:1903.02647v27 citations
Originality Incremental advance
AI Analysis

This addresses the problem of continual learning without task segmentation for reinforcement learning agents, though it is incremental as it builds on existing pseudo-rehearsal and distillation methods.

The paper tackles catastrophic forgetting in continual learning of world models for reinforcement learning by proposing pseudo-rehearsal with internally generated episodes, showing reduced temporal prediction loss in Atari games and enabling continual policy learning.

The utility of learning a dynamics/world model of the environment in reinforcement learning has been shown in a many ways. When using neural networks, however, these models suffer catastrophic forgetting when learned in a lifelong or continual fashion. Current solutions to the continual learning problem require experience to be segmented and labeled as discrete tasks, however, in continuous experience it is generally unclear what a sufficient segmentation of tasks would be. Here we propose a method to continually learn these internal world models through the interleaving of internally generated episodes of past experiences (i.e., pseudo-rehearsal). We show this method can sequentially learn unsupervised temporal prediction, without task labels, in a disparate set of Atari games. Empirically, this interleaving of the internally generated rollouts with the external environment's observations leads to a consistent reduction in temporal prediction loss compared to non-interleaved learning and is preserved over repeated random exposures to various tasks. Similarly, using a network distillation approach, we show that modern policy gradient based reinforcement learning algorithms can use this internal model to continually learn to optimize reward based on the world model's representation of the environment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes