AILGRODec 28, 2016

Efficient iterative policy optimization

arXiv:1612.08967v18 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient policy optimization in reinforcement learning, particularly for scenarios with limited updates, but appears incremental as it builds on existing methods.

The paper tackled the problem of finding a good policy with limited policy updates by approximating expected policy reward as concave lower bounds for efficient maximization, reducing the number of updates needed for good performance, and extended methods to handle negative rewards using control variates.

We tackle the issue of finding a good policy when the number of policy updates is limited. This is done by approximating the expected policy reward as a sequence of concave lower bounds which can be efficiently maximized, drastically reducing the number of policy updates required to achieve good performance. We also extend existing methods to negative rewards, enabling the use of control variates.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes