LG MLMar 11, 2013

Monte-Carlo utility estimates for Bayesian reinforcement learning

arXiv:1303.2506v1

Originality Incremental advance

AI Analysis

This work addresses computational efficiency and performance in Bayesian reinforcement learning, presenting incremental improvements to existing methods.

The paper tackles the problem of Bayesian reinforcement learning by introducing Monte-Carlo algorithms for estimating utility bounds, including an optimistic policy based on upper bounds and gradient methods for Bayesian Bellman error minimization. The results show the upper bound method achieves superior reward, while the Bayesian Bellman error method is computationally simpler and performs closely.

This paper introduces a set of algorithms for Monte-Carlo Bayesian reinforcement learning. Firstly, Monte-Carlo estimation of upper bounds on the Bayes-optimal value function is employed to construct an optimistic policy. Secondly, gradient-based algorithms for approximate upper and lower bounds are introduced. Finally, we introduce a new class of gradient algorithms for Bayesian Bellman error minimisation. We theoretically show that the gradient methods are sound. Experimentally, we demonstrate the superiority of the upper bound method in terms of reward obtained. However, we also show that the Bayesian Bellman error method is a close second, despite its significant computational simplicity.

View on arXiv PDF

Similar