Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction
This work addresses the exploration-exploitation trade-off in reinforcement learning for researchers, but it is incremental as it builds on existing BAMDP formalism with new theoretical insights and approximations.
The authors tackled the intractability of exact Bayesian reinforcement learning by defining a complexity measure for BAMDP planning that captures the difficulty of information acquisition, and they introduced a state abstraction method to enable a computationally-tractable approximate algorithm.
The Bayes-Adaptive Markov Decision Process (BAMDP) formalism pursues the Bayes-optimal solution to the exploration-exploitation trade-off in reinforcement learning. As the computation of exact solutions to Bayesian reinforcement-learning problems is intractable, much of the literature has focused on developing suitable approximation algorithms. In this work, before diving into algorithm design, we first define, under mild structural assumptions, a complexity measure for BAMDP planning. As efficient exploration in BAMDPs hinges upon the judicious acquisition of information, our complexity measure highlights the worst-case difficulty of gathering information and exhausting epistemic uncertainty. To illustrate its significance, we establish a computationally-intractable, exact planning algorithm that takes advantage of this measure to show more efficient planning. We then conclude by introducing a specific form of state abstraction with the potential to reduce BAMDP complexity and gives rise to a computationally-tractable, approximate planning algorithm.