LG MLJul 10, 2020

Improved Analysis of UCRL2 with Empirical Bernstein Inequality

Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

arXiv:2007.05456v117.134 citations

Originality Incremental advance

AI Analysis

This work provides an improved theoretical analysis for reinforcement learning algorithms, addressing regret bounds in MDPs, but it appears incremental as it builds on existing UCRL2 methods.

The paper tackles the exploration-exploitation problem in communicating Markov Decision Processes by analyzing UCRL2 with Empirical Bernstein inequalities, achieving a regret bound of $\widetilde{O}(\sqrt{D\Gamma S A T})$ for MDPs with $S$ states, $A$ actions, $\Gamma \leq S$ next states, and diameter $D$.

We consider the problem of exploration-exploitation in communicating Markov Decision Processes. We provide an analysis of UCRL2 with Empirical Bernstein inequalities (UCRL2B). For any MDP with $S$ states, $A$ actions, $Γ\leq S$ next states and diameter $D$, the regret of UCRL2B is bounded as $\widetilde{O}(\sqrt{DΓS A T})$.

View on arXiv PDF

Similar