LGAIGTMLMay 27, 2019

Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

arXiv:1905.12425v233 citations
Originality Incremental advance
AI Analysis

This work addresses efficient learning in reinforcement learning for researchers and practitioners, providing a theoretically optimal solution with experimental validation, though it is incremental as it builds on existing variance-based methods.

The paper tackles the problem of model-based reinforcement learning in unknown finite communicating Markov decision processes by proposing the UCRL-V algorithm, which achieves near-optimal regret of $ ilde{\mathcal{O}}(\sqrt{DSAT})$ up to logarithmic factors, closing a gap with the lower bound without additional assumptions.

We study model-based reinforcement learning in an unknown finite communicating Markov decision process. We propose a simple algorithm that leverages a variance based confidence interval. We show that the proposed algorithm, UCRL-V, achieves the optimal regret $\tilde{\mathcal{O}}(\sqrt{DSAT})$ up to logarithmic factors, and so our work closes a gap with the lower bound without additional assumptions on the MDP. We perform experiments in a variety of environments that validates the theoretical bounds as well as prove UCRL-V to be better than the state-of-the-art algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes