LGMLApr 10, 2018

TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent

arXiv:1804.03334v117 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of manual step-size tuning in reinforcement learning, offering a more efficient optimization approach, though it is incremental as it generalizes an existing supervised learning method to TD learning.

The paper tackles the problem of automatically adapting step-sizes in temporal difference (TD) learning by introducing TIDBD, a vectorized adaptive step-size method that outperforms ordinary TD and scalar adaptation methods in stationary and non-stationary tasks, including a real-world robot prediction task.

In this paper, we introduce a method for adapting the step-sizes of temporal difference (TD) learning. The performance of TD methods often depends on well chosen step-sizes, yet few algorithms have been developed for setting the step-size automatically for TD learning. An important limitation of current methods is that they adapt a single step-size shared by all the weights of the learning system. A vector step-size enables greater optimization by specifying parameters on a per-feature basis. Furthermore, adapting parameters at different rates has the added benefit of being a simple form of representation learning. We generalize Incremental Delta Bar Delta (IDBD)---a vectorized adaptive step-size method for supervised learning---to TD learning, which we name TIDBD. We demonstrate that TIDBD is able to find appropriate step-sizes in both stationary and non-stationary prediction tasks, outperforming ordinary TD methods and TD methods with scalar step-size adaptation; we demonstrate that it can differentiate between features which are relevant and irrelevant for a given task, performing representation learning; and we show on a real-world robot prediction task that TIDBD is able to outperform ordinary TD methods and TD methods augmented with AlphaBound and RMSprop.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes