A Refined Analysis of UCBVI
This work provides incremental improvements to an existing algorithm for reinforcement learning researchers.
The paper tackled the problem of refining the UCBVI algorithm by improving its bonus terms and regret analysis, resulting in significant positive effects on empirical performance as demonstrated through comparisons with the original version and state-of-the-art MVP algorithm.
In this work, we provide a refined analysis of the UCBVI algorithm (Azar et al., 2017), improving both the bonus terms and the regret analysis. Additionally, we compare our version of UCBVI with both its original version and the state-of-the-art MVP algorithm. Our empirical validation demonstrates that improving the multiplicative constants in the bounds has significant positive effects on the empirical performance of the algorithms.