LG AI MLJul 14, 2020

Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting

arXiv:2007.07011v216.242 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the problem of inefficient training and forgetting in lifelong reinforcement learning for agents, representing a novel method for a known bottleneck.

The paper tackles the slow exploration problem of policy gradient methods in lifelong learning by introducing a novel method that trains lifelong function approximators directly via policy gradients, resulting in faster learning, better convergence, and complete avoidance of catastrophic forgetting across challenging domains.

Policy gradient methods have shown success in learning control policies for high-dimensional dynamical systems. Their biggest downside is the amount of exploration they require before yielding high-performing policies. In a lifelong learning setting, in which an agent is faced with multiple consecutive tasks over its lifetime, reusing information from previously seen tasks can substantially accelerate the learning of new tasks. We provide a novel method for lifelong policy gradient learning that trains lifelong function approximators directly via policy gradients, allowing the agent to benefit from accumulated knowledge throughout the entire training process. We show empirically that our algorithm learns faster and converges to better policies than single-task and lifelong learning baselines, and completely avoids catastrophic forgetting on a variety of challenging domains.

View on arXiv PDF Code

Similar