ML LGDec 21, 2014

Implicit Temporal Differences

Aviv Tamar, Panos Toulis, Shie Mannor, Edoardo M. Airoldi

arXiv:1412.6734v11 citations

Originality Incremental advance

AI Analysis

This work addresses a practical stability issue in policy evaluation for reinforcement learning, offering an incremental improvement for researchers and practitioners dealing with large-scale problems.

The paper tackles the sensitivity of the TD(λ) algorithm in reinforcement learning to step-size choice, which affects stability and convergence, by introducing implicit TD(λ) that offers improved stability with similar computational cost. Results show it outperforms standard TD(λ) and a state-of-the-art step-size tuning method on benchmark tasks.

In reinforcement learning, the TD($λ$) algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems. One practical drawback of TD($λ$) is its sensitivity to the choice of the step-size. It is an empirically well-known fact that a large step-size leads to fast convergence, at the cost of higher variance and risk of instability. In this work, we introduce the implicit TD($λ$) algorithm which has the same function and computational cost as TD($λ$), but is significantly more stable. We provide a theoretical explanation of this stability and an empirical evaluation of implicit TD($λ$) on typical benchmark tasks. Our results show that implicit TD($λ$) outperforms standard TD($λ$) and a state-of-the-art method that automatically tunes the step-size, and thus shows promise for wide applicability.

View on arXiv PDF

Similar