LGMLDec 21, 2019

Exploring TD error as a heuristic for $σ$ selection in Q($σ$, $λ$)

arXiv:1912.10316v1
Originality Synthesis-oriented
AI Analysis

This is an incremental improvement for reinforcement learning researchers focusing on TD algorithms.

The paper tackled the problem of selecting the parameter σ in the Q(σ, λ) algorithm, which controls sampling versus expectation in TD backups, by exploring a TD-error-based heuristic, but no concrete results or numbers are reported.

In the landscape of TD algorithms, the Q($σ$, $λ$) algorithm is an algorithm with the ability to perform a multistep backup in an online manner while also successfully unifying the concepts of sampling with using the expectation across all actions for a state. $σ\in [0, 1]$ indicates the extent to which sampling is used. Selecting the value of σ can be based on characteristics of the current state rather than having a constant value or being time based. This report explores the viability of such a TD-error based scheme.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes