AI LGMar 1, 2019

Should All Temporal Difference Learning Use Emphasis?

Xiang Gu, Sina Ghiassian, Richard S. Sutton

arXiv:1903.00194v15.14 citations

Originality Incremental advance

AI Analysis

This addresses convergence problems in reinforcement learning for researchers and practitioners, but it is incremental as it builds on prior work suggesting ETD as a substitute for TD.

The paper tackles the problem of convergence issues in Temporal Difference (TD) learning by empirically showing that Emphatic Temporal Difference (ETD) learning converges on on-policy experiments where TD diverges or performs poorly, and outperforms TD on the mountain car prediction problem.

Emphatic Temporal Difference (ETD) learning has recently been proposed as a convergent off-policy learning method. ETD was proposed mainly to address convergence issues of conventional Temporal Difference (TD) learning under off-policy training but it is different from conventional TD learning even under on-policy training. A simple counterexample provided back in 2017 pointed to a potential class of problems where ETD converges but TD diverges. In this paper, we empirically show that ETD converges on a few other well-known on-policy experiments whereas TD either diverges or performs poorly. We also show that ETD outperforms TD on the mountain car prediction problem. Our results, together with a similar pattern observed under off-policy training in prior works, suggest that ETD might be a good substitute over conventional TD.

View on arXiv PDF

Similar