LGApr 21, 2015

Temporal-Difference Networks

arXiv:1504.05539v198 citations
Originality Highly original
AI Analysis

This work addresses a foundational problem in reinforcement learning by extending TD methods to a broader class of predictions, which is incremental but substantial for researchers in AI and machine learning.

The authors tackled the limitation of conventional temporal-difference (TD) learning by generalizing it to networks of interrelated predictions, enabling learning to predict by a fixed interval, improving efficiency over Monte Carlo methods, and solving non-Markov problems exactly.

We introduce a generalization of temporal-difference (TD) learning to networks of interrelated predictions. Rather than relating a single prediction to itself at a later time, as in conventional TD methods, a TD network relates each prediction in a set of predictions to other predictions in the set at a later time. TD networks can represent and apply TD learning to a much wider class of predictions than has previously been possible. Using a random-walk example, we show that these networks can be used to learn to predict by a fixed interval, which is not possible with conventional TD methods. Secondly, we show that if the inter-predictive relationships are made conditional on action, then the usual learning-efficiency advantage of TD methods over Monte Carlo (supervised learning) methods becomes particularly pronounced. Thirdly, we demonstrate that TD networks can learn predictive state representations that enable exact solution of a non-Markov problem. A very broad range of inter-predictive temporal relationships can be expressed in these networks. Overall we argue that TD networks represent a substantial extension of the abilities of TD methods and bring us closer to the goal of representing world knowledge in entirely predictive, grounded terms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes