Accelerated Gradient Temporal Difference Learning
This work addresses the problem of improving reinforcement learning algorithms for researchers and practitioners by offering a more efficient and robust alternative to existing TD methods, though it appears incremental as it builds on prior gradient TD developments.
The paper tackles the trade-off between computational efficiency and data efficiency in temporal difference learning by proposing accelerated gradient TD methods, which achieve similar data efficiency to least-squares methods with reduced computation and storage, and show reduced parameter sensitivity compared to linear TD methods while being asymptotically unbiased, as demonstrated through convergence proofs and experiments on benchmark and industrial domains.
The family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD(λ) to data efficient least squares methods. Least square methods make the best use of available data directly computing the TD solution and thus do not require tuning a typically highly sensitive learning rate parameter, but require quadratic computation and storage. Recent algorithmic developments have yielded several sub-quadratic methods that use an approximation to the least squares TD solution, but incur bias. In this paper, we propose a new family of accelerated gradient TD (ATD) methods that (1) provide similar data efficiency benefits to least-squares methods, at a fraction of the computation and storage (2) significantly reduce parameter sensitivity compared to linear TD methods, and (3) are asymptotically unbiased. We illustrate these claims with a proof of convergence in expectation and experiments on several benchmark domains and a large-scale industrial energy allocation domain.