LG MLOct 7, 2019

Multi-step Greedy Reinforcement Learning Algorithms

Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh

arXiv:1910.02919v39.512 citations

Originality Incremental advance

AI Analysis

This work addresses performance improvement in model-free reinforcement learning for applications like gaming and robotics, though it is incremental as it builds on existing RL methods.

The paper tackles the problem of improving model-free reinforcement learning performance by introducing multi-step greedy algorithms, κ-Policy Iteration and κ-Value Iteration, which use surrogate decision problems with shaped rewards and reduced discount factors. Results show that for appropriate κ values, these algorithms outperform DQN and TRPO on Atari and MuJoCo benchmarks, indicating significant performance gains.

Multi-step greedy policies have been extensively used in model-based reinforcement learning (RL), both when a model of the environment is available (e.g.,~in the game of Go) and when it is learned. In this paper, we explore their benefits in model-free RL, when employed using multi-step dynamic programming algorithms: $κ$-Policy Iteration ($κ$-PI) and $κ$-Value Iteration ($κ$-VI). These methods iteratively compute the next policy ($κ$-PI) and value function ($κ$-VI) by solving a surrogate decision problem with a shaped reward and a smaller discount factor. We derive model-free RL algorithms based on $κ$-PI and $κ$-VI in which the surrogate problem can be solved by any discrete or continuous action RL method, such as DQN and TRPO. We identify the importance of a hyper-parameter that controls the extent to which the surrogate problem is solved and suggest a way to set this parameter. When evaluated on a range of Atari and MuJoCo benchmark tasks, our results indicate that for the right range of $κ$, our algorithms outperform DQN and TRPO. This shows that our multi-step greedy algorithms are general enough to be applied over any existing RL algorithm and can significantly improve its performance.

View on arXiv PDF

Similar