LGAIOct 27, 2020

Generative Temporal Difference Learning for Infinite-Horizon Prediction

arXiv:2010.14496v452 citations
Originality Incremental advance
AI Analysis

This work addresses a fundamental challenge in reinforcement learning for agents needing to plan over extended horizons, offering a hybrid approach that bridges model-free and model-based methods, though it is incremental in building on existing concepts like the successor representation.

The paper tackles the problem of long-term environment prediction in reinforcement learning by introducing the $\\gamma$-model, a predictive model with an infinite probabilistic horizon, which generalizes model-based control procedures and is trained using a generative reinterpretation of temporal difference learning, showing utility in prediction and control tasks.

We introduce the $γ$-model, a predictive model of environment dynamics with an infinite probabilistic horizon. Replacing standard single-step models with $γ$-models leads to generalizations of the procedures central to model-based control, including the model rollout and model-based value estimation. The $γ$-model, trained with a generative reinterpretation of temporal difference learning, is a natural continuous analogue of the successor representation and a hybrid between model-free and model-based mechanisms. Like a value function, it contains information about the long-term future; like a standard predictive model, it is independent of task reward. We instantiate the $γ$-model as both a generative adversarial network and normalizing flow, discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors, and empirically investigate its utility for prediction and control.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes