LGAIMLApr 23, 2018

Distributed Distributional Deterministic Policy Gradients

arXiv:1804.08617v1537 citations
Originality Highly original
AI Analysis

This work addresses the challenge of improving reinforcement learning for continuous control tasks, which is important for robotics and AI applications, and while it builds on existing methods, it integrates multiple components to achieve strong results.

The authors tackled the problem of continuous control in reinforcement learning by developing the D4PG algorithm, which combines distributional perspectives with distributed off-policy learning and additional improvements like N-step returns and prioritized experience replay, achieving state-of-the-art performance across various control, manipulation, and locomotion tasks.

This work adopts the very successful distributional perspective on reinforcement learning and adapts it to the continuous control setting. We combine this within a distributed framework for off-policy learning in order to develop what we call the Distributed Distributional Deep Deterministic Policy Gradient algorithm, D4PG. We also combine this technique with a number of additional, simple improvements such as the use of $N$-step returns and prioritized experience replay. Experimentally we examine the contribution of each of these individual components, and show how they interact, as well as their combined contributions. Our results show that across a wide variety of simple control tasks, difficult manipulation tasks, and a set of hard obstacle-based locomotion tasks the D4PG algorithm achieves state of the art performance.

Code Implementations5 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes