LGMay 19, 2016

On a convergent off -policy temporal difference learning algorithm in on-line learning environment

arXiv:1605.06076v1
Originality Synthesis-oriented
AI Analysis

This work addresses the convergence guarantees for off-policy reinforcement learning algorithms, which is an incremental improvement for researchers in machine learning.

The paper provides a rigorous convergence analysis of the TDC algorithm with importance weighting for off-policy temporal difference learning in online environments, supporting the theoretical results with empirical tests on standard counterexamples.

In this paper we provide a rigorous convergence analysis of a "off"-policy temporal difference learning algorithm with linear function approximation and per time-step linear computational complexity in "online" learning environment. The algorithm considered here is TDC with importance weighting introduced by Maei et al. We support our theoretical results by providing suitable empirical results for standard off-policy counterexamples.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes