LG MLApr 10, 2018

TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent

Alex Kearney, Vivek Veeriah, Jaden B. Travnik, Richard S. Sutton, Patrick M. Pilarski

arXiv:1804.03334v19.417 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of manual step-size tuning in reinforcement learning, offering a more efficient optimization approach, though it is incremental as it generalizes an existing supervised learning method to TD learning.

The paper tackles the problem of automatically adapting step-sizes in temporal difference (TD) learning by introducing TIDBD, a vectorized adaptive step-size method that outperforms ordinary TD and scalar adaptation methods in stationary and non-stationary tasks, including a real-world robot prediction task.

In this paper, we introduce a method for adapting the step-sizes of temporal difference (TD) learning. The performance of TD methods often depends on well chosen step-sizes, yet few algorithms have been developed for setting the step-size automatically for TD learning. An important limitation of current methods is that they adapt a single step-size shared by all the weights of the learning system. A vector step-size enables greater optimization by specifying parameters on a per-feature basis. Furthermore, adapting parameters at different rates has the added benefit of being a simple form of representation learning. We generalize Incremental Delta Bar Delta (IDBD)---a vectorized adaptive step-size method for supervised learning---to TD learning, which we name TIDBD. We demonstrate that TIDBD is able to find appropriate step-sizes in both stationary and non-stationary prediction tasks, outperforming ordinary TD methods and TD methods with scalar step-size adaptation; we demonstrate that it can differentiate between features which are relevant and irrelevant for a given task, performing representation learning; and we show on a real-world robot prediction task that TIDBD is able to outperform ordinary TD methods and TD methods augmented with AlphaBound and RMSprop.

View on arXiv PDF

Similar