LGMLOct 16, 2018

ProMP: Proximal Meta-Policy Search

arXiv:1810.06784v4223 citations
Originality Incremental advance
AI Analysis

This addresses a key bottleneck in meta-RL for researchers and practitioners, offering a more efficient and stable approach, though it appears incremental as it builds on existing gradient-based methods.

The paper tackled the problem of credit assignment in meta-reinforcement learning, which leads to poor sample efficiency and task identification, by developing a novel algorithm that controls policy distances, resulting in superior performance in sample-efficiency, wall-clock time, and asymptotic outcomes.

Credit assignment in Meta-reinforcement learning (Meta-RL) is still poorly understood. Existing methods either neglect credit assignment to pre-adaptation behavior or implement it naively. This leads to poor sample-efficiency during meta-training as well as ineffective task identification strategies. This paper provides a theoretical analysis of credit assignment in gradient-based Meta-RL. Building on the gained insights we develop a novel meta-learning algorithm that overcomes both the issue of poor credit assignment and previous difficulties in estimating meta-policy gradients. By controlling the statistical distance of both pre-adaptation and adapted policies during meta-policy search, the proposed algorithm endows efficient and stable meta-learning. Our approach leads to superior pre-adaptation policy behavior and consistently outperforms previous Meta-RL algorithms in sample-efficiency, wall-clock time, and asymptotic performance.

Code Implementations6 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes