AI LG RODec 28, 2016

Efficient iterative policy optimization

arXiv:1612.08967v111.38 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of efficient policy optimization in reinforcement learning, particularly for scenarios with limited updates, but appears incremental as it builds on existing methods.

The paper tackled the problem of finding a good policy with limited policy updates by approximating expected policy reward as concave lower bounds for efficient maximization, reducing the number of updates needed for good performance, and extended methods to handle negative rewards using control variates.

We tackle the issue of finding a good policy when the number of policy updates is limited. This is done by approximating the expected policy reward as a sequence of concave lower bounds which can be efficiently maximized, drastically reducing the number of policy updates required to achieve good performance. We also extend existing methods to negative rewards, enabling the use of control variates.

View on arXiv PDF

Similar