LGAIFeb 13, 2018

Evolved Policy Gradients

arXiv:1802.04821v2238 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving learning efficiency in reinforcement learning for researchers and practitioners, though it appears incremental as it builds on existing metalearning and policy gradient approaches.

The paper tackles the problem of slow learning in gradient-based reinforcement learning by evolving a differentiable loss function that considers the agent's history, resulting in faster learning on randomized environments compared to standard policy gradient methods.

We propose a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve high rewards. The loss is parametrized via temporal convolutions over the agent's experience. Because this loss is highly flexible in its ability to take into account the agent's history, it enables fast task learning. Empirical results show that our evolved policy gradient algorithm (EPG) achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method. We also demonstrate that EPG's learned loss can generalize to out-of-distribution test time tasks, and exhibits qualitatively different behavior from other popular metalearning algorithms.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes