LG MLJun 3, 2019

Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

Harm van Seijen, Mehdi Fatemi, Arash Tavakoli

arXiv:1906.00572v212.235 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses a bottleneck in reinforcement learning for researchers and practitioners by enabling more effective solutions to problems that are difficult with traditional methods, though it is incremental in nature.

The paper tackled the problem of poor performance with low discount factors in reinforcement learning by identifying non-homogeneous action-gaps as the cause, and introduced a logarithmic mapping method that enables lower discount factors, empirically improving performance for challenging problems.

In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis that identifies the size-difference of the action-gap across the state-space as the primary cause. We then introduce a new method that enables more homogeneous action-gaps by mapping value estimates to a logarithmic space. We prove convergence for this method under standard assumptions and demonstrate empirically that it indeed enables lower discount factors for approximate reinforcement-learning methods. This in turn allows tackling a class of reinforcement-learning problems that are challenging to solve with traditional methods.

View on arXiv PDF Code

Similar