Bayesian Reinforcement Learning via Deep, Sparse Sampling
This work addresses the problem of efficient exploration and planning in reinforcement learning for researchers and practitioners, though it appears incremental as it builds on existing Bayesian methods with specific improvements.
The paper tackles Bayesian reinforcement learning by proposing an optimism-free Bayes-adaptive algorithm that uses a candidate policy generator to create sparser and deeper planning trees, resulting in significantly higher reward and lower computational complexity compared to state-of-the-art methods in discrete environments.
We address the problem of Bayesian reinforcement learning using efficient model-based online planning. We propose an optimism-free Bayes-adaptive algorithm to induce deeper and sparser exploration with a theoretical bound on its performance relative to the Bayes optimal policy, with a lower computational complexity. The main novelty is the use of a candidate policy generator, to generate long-term options in the planning tree (over beliefs), which allows us to create much sparser and deeper trees. Experimental results on different environments show that in comparison to the state-of-the-art, our algorithm is both computationally more efficient, and obtains significantly higher reward in discrete environments.