LGJul 25, 2015

True Online Emphatic TD($λ$): Quick Reference and Implementation Guide

arXiv:1507.07147v11 citations

Originality Synthesis-oriented

AI Analysis

This is an incremental guide for researchers implementing reinforcement learning algorithms, focusing on a specific combination of existing methods.

The paper provides an implementation guide for true online emphatic TD(λ), a model-free temporal-difference algorithm that combines emphasis and true-online ideas for long-term predictions with linear function approximation and off-policy training.

This document is a guide to the implementation of true online emphatic TD($λ$), a model-free temporal-difference algorithm for learning to make long-term predictions which combines the emphasis idea (Sutton, Mahmood & White 2015) and the true-online idea (van Seijen & Sutton 2014). The setting used here includes linear function approximation, the possibility of off-policy training, and all the generality of general value functions, as well as the emphasis algorithm's notion of "interest".

View on arXiv PDF

Similar