LGAISYOCMLJul 11, 2024

PID Accelerated Temporal Difference Algorithms

arXiv:2407.08803v22 citationsh-index: 15
Originality Incremental advance
AI Analysis

This work addresses inefficiencies in reinforcement learning for long-horizon tasks, offering a method to speed up convergence, though it appears incremental as it builds on prior PID VI work.

The authors tackled slow convergence in long-horizon reinforcement learning tasks by introducing PID TD Learning and PID Q-Learning algorithms, which accelerate Temporal Difference learning using control theory ideas and show theoretical and empirical improvements.

Long-horizon tasks, which have a large discount factor, pose a challenge for most conventional reinforcement learning (RL) algorithms. Algorithms such as Value Iteration and Temporal Difference (TD) learning have a slow convergence rate and become inefficient in these tasks. When the transition distributions are given, PID VI was recently introduced to accelerate the convergence of Value Iteration using ideas from control theory. Inspired by this, we introduce PID TD Learning and PID Q-Learning algorithms for the RL setting, in which only samples from the environment are available. We give a theoretical analysis of the convergence of PID TD Learning and its acceleration compared to the conventional TD Learning. We also introduce a method for adapting PID gains in the presence of noise and empirically verify its effectiveness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes