LGMLJul 10, 2020

Improved Analysis of UCRL2 with Empirical Bernstein Inequality

arXiv:2007.05456v134 citations
Originality Incremental advance
AI Analysis

This work provides an improved theoretical analysis for reinforcement learning algorithms, addressing regret bounds in MDPs, but it appears incremental as it builds on existing UCRL2 methods.

The paper tackles the exploration-exploitation problem in communicating Markov Decision Processes by analyzing UCRL2 with Empirical Bernstein inequalities, achieving a regret bound of $\widetilde{O}(\sqrt{D\Gamma S A T})$ for MDPs with $S$ states, $A$ actions, $\Gamma \leq S$ next states, and diameter $D$.

We consider the problem of exploration-exploitation in communicating Markov Decision Processes. We provide an analysis of UCRL2 with Empirical Bernstein inequalities (UCRL2B). For any MDP with $S$ states, $A$ actions, $Γ\leq S$ next states and diameter $D$, the regret of UCRL2B is bounded as $\widetilde{O}(\sqrt{DΓS A T})$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes