Continual Visual Reinforcement Learning with A Life-Long World Model
This addresses the problem of catastrophic forgetting in continual visual reinforcement learning for agents needing to adapt to new tasks without losing previous knowledge, representing an incremental advancement.
The paper tackles the challenge of learning physical dynamics in non-stationary environments for model-based reinforcement learning with visual inputs, achieving remarkable performance improvements over existing methods on DeepMind Control Suite and Meta-World benchmarks.
Learning physical dynamics in a series of non-stationary environments is a challenging but essential task for model-based reinforcement learning (MBRL) with visual inputs. It requires the agent to consistently adapt to novel tasks without forgetting previous knowledge. In this paper, we present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control. The key assumption is that an ideal world model can provide a non-forgetting environment simulator, which enables the agent to optimize the policy in a multi-task learning manner based on the imagined trajectories from the world model. To this end, we first introduce the life-long world model, which learns task-specific latent dynamics using a mixture of Gaussians and incorporates generative experience replay to mitigate catastrophic forgetting. Then, we further address the value estimation challenge for previous tasks with the exploratory-conservative behavior learning approach. Our model remarkably outperforms the straightforward combinations of existing continual learning and visual RL algorithms on DeepMind Control Suite and Meta-World benchmarks with continual visual control tasks.