LGAIITROMLJun 14, 2018

Maximum a Posteriori Policy Optimisation

arXiv:1806.06920v1576 citations
Originality Incremental advance
AI Analysis

This work addresses challenges in reinforcement learning for continuous control, offering improved sample efficiency and robustness, though it appears incremental as it builds on existing methods.

The authors tackled the problem of sample efficiency and robustness in deep reinforcement learning for continuous control by introducing Maximum a Posteriori Policy Optimisation (MPO), which outperformed existing methods in these aspects while achieving similar or better final performance.

We introduce a new algorithm for reinforcement learning called Maximum aposteriori Policy Optimisation (MPO) based on coordinate ascent on a relative entropy objective. We show that several existing methods can directly be related to our derivation. We develop two off-policy algorithms and demonstrate that they are competitive with the state-of-the-art in deep reinforcement learning. In particular, for continuous control, our method outperforms existing methods with respect to sample efficiency, premature convergence and robustness to hyperparameter settings while achieving similar or better final performance.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes