Practical Open-Loop Optimistic Planning
This work addresses incremental improvements in planning algorithms for researchers in reinforcement learning and AI, offering better practical efficiency for online decision-making under constraints.
The paper tackled the problem of online planning in Markov Decision Processes with a generative model and budget constraints, focusing on open-loop policies, and proposed KLOLOP, a modified algorithm with tighter bounds that improved practical performance while maintaining sample complexity, with an efficient implementation reducing time complexity.
We consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies - i.e. sequences of actions - and under budget constraint. In this setting, the Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical guarantees but is overly conservative in practice, as we show in numerical experiments. We propose a modified version of the algorithm with tighter upper-confidence bounds, KLOLOP, that leads to better practical performances while retaining the sample complexity bound. Finally, we propose an efficient implementation that significantly improves the time complexity of both algorithms.