LG AI RO MLDec 26, 2019

Quasi-Newton Trust Region Policy Optimization

Devesh Jha, Arvind Raghunathan, Diego Romeres

arXiv:1912.11912v16.013 citations

Originality Incremental advance

AI Analysis

This work addresses efficiency and performance issues in reinforcement learning for continuous control tasks, representing an incremental improvement over existing methods.

The paper tackles the drawbacks of gradient descent in reinforcement learning for continuous control, such as slow convergence and lack of stepsize selection, by proposing Quasi-Newton Trust Region Policy Optimization (QNTRPO), which improves performance and efficiency in sample usage across challenging tasks.

We propose a trust region method for policy optimization that employs Quasi-Newton approximation for the Hessian, called Quasi-Newton Trust Region Policy Optimization QNTRPO. Gradient descent is the de facto algorithm for reinforcement learning tasks with continuous controls. The algorithm has achieved state-of-the-art performance when used in reinforcement learning across a wide range of tasks. However, the algorithm suffers from a number of drawbacks including: lack of stepsize selection criterion, and slow convergence. We investigate the use of a trust region method using dogleg step and a Quasi-Newton approximation for the Hessian for policy optimization. We demonstrate through numerical experiments over a wide range of challenging continuous control tasks that our particular choice is efficient in terms of number of samples and improves performance

View on arXiv PDF

Similar