LG AI PROct 24, 2020

An Adiabatic Theorem for Policy Tracking with TD-learning

arXiv:2010.12848v21.2

Originality Incremental advance

AI Analysis

This work addresses the challenge of adapting reinforcement learning algorithms to dynamic environments, but it appears incremental as it builds on existing methods with new theoretical bounds.

The paper tackled the problem of tracking a changing policy's reward function using temporal difference learning, and derived finite-time bounds for tabular TD-learning and Q-learning under time-varying policies.

We evaluate the ability of temporal difference learning to track the reward function of a policy as it changes over time. Our results apply a new adiabatic theorem that bounds the mixing time of time-inhomogeneous Markov chains. We derive finite-time bounds for tabular temporal difference learning and $Q$-learning when the policy used for training changes in time. To achieve this, we develop bounds for stochastic approximation under asynchronous adiabatic updates.

View on arXiv PDF

Similar