LGJul 25, 2022

Online Reinforcement Learning for Periodic MDP

arXiv:2207.12045v1h-index: 2
Originality Incremental advance
AI Analysis

This addresses reinforcement learning in non-stationary environments with periodic variations, which is an incremental extension of existing stationary MDP methods.

The authors tackled the problem of learning in periodic Markov Decision Processes (MDPs) where transition probabilities and rewards vary periodically, proposing the PUCRL2 algorithm which achieves a regret that scales linearly with the period and sub-linearly with the horizon length.

We study learning in periodic Markov Decision Process(MDP), a special type of non-stationary MDP where both the state transition probabilities and reward functions vary periodically, under the average reward maximization setting. We formulate the problem as a stationary MDP by augmenting the state space with the period index, and propose a periodic upper confidence bound reinforcement learning-2 (PUCRL2) algorithm. We show that the regret of PUCRL2 varies linearly with the period and as sub-linear with the horizon length. Numerical results demonstrate the efficacy of PUCRL2.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes