LGMay 9, 2012

REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs

arXiv:1205.2661v1303 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient reinforcement learning in weakly communicating MDPs, which is incremental as it builds on prior regret bounds by relating span to diameter-like quantities.

The authors tackled the problem of achieving optimal regret in unknown weakly communicating Markov Decision Processes (MDPs) by introducing REGAL, a regularization-based algorithm, and demonstrated a regret bound of ~O(HSpAT) for MDPs with S states, A actions, and optimal bias vector span H.

We provide an algorithm that achieves the optimal regret rate in an unknown weakly communicating Markov Decision Process (MDP). The algorithm proceeds in episodes where, in each episode, it picks a policy using regularization based on the span of the optimal bias vector. For an MDP with S states and A actions whose optimal bias vector has span bounded by H, we show a regret bound of ~O(HSpAT). We also relate the span to various diameter-like quantities associated with the MDP, demonstrating how our results improve on previous regret bounds.

View on arXiv PDF

Similar