LG AIDec 20, 2023

Bayesian Analysis of Combinatorial Gaussian Process Bandits

Jack Sandberg, Niklas Åkerblom, Morteza Haghir Chehreghani

arXiv:2312.12676v36.63 citationsh-index: 16ICLR

Originality Incremental advance

AI Analysis

This work addresses the problem of optimizing subset selection in volatile environments for researchers in bandit algorithms, though it is incremental as it extends existing methods to new settings.

The authors tackled the combinatorial volatile Gaussian process semi-bandit problem by providing novel Bayesian cumulative regret bounds for GP-UCB, GP-BayesUCB, and GP-TS algorithms, extending previous results to infinite, volatile, and combinatorial settings and offering the first bound for GP-BayesUCB, with application to online energy-efficient navigation showing effectiveness.

We consider the combinatorial volatile Gaussian process (GP) semi-bandit problem. Each round, an agent is provided a set of available base arms and must select a subset of them to maximize the long-term cumulative reward. We study the Bayesian setting and provide novel Bayesian cumulative regret bounds for three GP-based algorithms: GP-UCB, GP-BayesUCB and GP-TS. Our bounds extend previous results for GP-UCB and GP-TS to the infinite, volatile and combinatorial setting, and to the best of our knowledge, we provide the first regret bound for GP-BayesUCB. Volatile arms encompass other widely considered bandit problems such as contextual bandits. Furthermore, we employ our framework to address the challenging real-world problem of online energy-efficient navigation, where we demonstrate its effectiveness compared to the alternatives.

View on arXiv PDF

Similar