Gaussian Process Planning with Lipschitz Continuous Reward Functions: Towards Unifying Bayesian Optimization, Active Learning, and Beyond
This work addresses the challenge of exploration-exploitation trade-offs in sequential decision-making for practitioners in fields like Bayesian optimization and active learning, though it is incremental in extending existing methods.
The paper tackles the problem of nonmyopic adaptive planning in Gaussian processes by introducing a framework with Lipschitz continuous reward functions to unify Bayesian optimization and active learning, achieving an epsilon-optimal policy with performance guarantees in empirical tests.
This paper presents a novel nonmyopic adaptive Gaussian process planning (GPP) framework endowed with a general class of Lipschitz continuous reward functions that can unify some active learning/sensing and Bayesian optimization criteria and offer practitioners some flexibility to specify their desired choices for defining new tasks/problems. In particular, it utilizes a principled Bayesian sequential decision problem framework for jointly and naturally optimizing the exploration-exploitation trade-off. In general, the resulting induced GPP policy cannot be derived exactly due to an uncountable set of candidate observations. A key contribution of our work here thus lies in exploiting the Lipschitz continuity of the reward functions to solve for a nonmyopic adaptive epsilon-optimal GPP (epsilon-GPP) policy. To plan in real time, we further propose an asymptotically optimal, branch-and-bound anytime variant of epsilon-GPP with performance guarantee. We empirically demonstrate the effectiveness of our epsilon-GPP policy and its anytime variant in Bayesian optimization and an energy harvesting task.