LGJan 31, 2025

True Online TD-Replan(lambda) Achieving Planning through Replaying

arXiv:2501.19027v1h-index: 6

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving planning efficiency through experience replay in reinforcement learning, but it is incremental as it builds upon existing TD methods with a novel replay mechanism.

The paper tackles the problem of enabling efficient online replay of past experiences in reinforcement learning by introducing True Online TD-Replan(λ), which extends true online TD to allow replay based on the λ parameter. The result shows that this method outperforms true online TD(λ) and other quadratic-complexity methods like Dyna Planning and TD(λ)-Replan in benchmarking environments, though it incurs quadratic complexity due to replay.

In this paper, we develop a new planning method that extends the capabilities of the true online TD to allow an agent to efficiently replay all or part of its past experience, online in the sequence that they appear with, either in each step or sparsely according to the usual λ parameter. In this new method that we call True Online TD-Replan(λ), the λ parameter plays a new role in specifying the density of the replay process in addition to the usual role of specifying the depth of the target's updates. We demonstrate that, for problems that benefit from experience replay, our new method outperforms true online TD(λ), albeit quadratic in complexity due to its replay capabilities. In addition, we demonstrate that our method outperforms other methods with similar quadratic complexity such as Dyna Planning and TD(λ)-Replan algorithms. We test our method on two benchmarking environments, a random walk problem that uses simple binary features and a myoelectric control domain that uses both simple sEMG features and deeply extracted features to showcase its capabilities.

View on arXiv PDF

Similar