LG AI GT MLMay 27, 2019

Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

Aristide Tossou, Debabrota Basu, Christos Dimitrakakis

arXiv:1905.12425v213.133 citations

Originality Incremental advance

AI Analysis

This work addresses efficient learning in reinforcement learning for researchers and practitioners, providing a theoretically optimal solution with experimental validation, though it is incremental as it builds on existing variance-based methods.

The paper tackles the problem of model-based reinforcement learning in unknown finite communicating Markov decision processes by proposing the UCRL-V algorithm, which achieves near-optimal regret of $ ilde{\mathcal{O}}(\sqrt{DSAT})$ up to logarithmic factors, closing a gap with the lower bound without additional assumptions.

We study model-based reinforcement learning in an unknown finite communicating Markov decision process. We propose a simple algorithm that leverages a variance based confidence interval. We show that the proposed algorithm, UCRL-V, achieves the optimal regret $\tilde{\mathcal{O}}(\sqrt{DSAT})$ up to logarithmic factors, and so our work closes a gap with the lower bound without additional assumptions on the MDP. We perform experiments in a variety of environments that validates the theoretical bounds as well as prove UCRL-V to be better than the state-of-the-art algorithms.

View on arXiv PDF

Similar