Lenient Regret and Good-Action Identification in Gaussian Process Bandits
This work addresses the problem of efficient optimization in bandit settings for researchers and practitioners, offering incremental improvements by relaxing criteria to speed up identification of satisfactory actions.
The paper tackles the problem of Gaussian process bandits under relaxed criteria where function values above a threshold are considered good, introducing lenient regret notions that avoid the typical O(sqrt(T)) penalty for near-optimal actions and providing upper bounds for algorithms like GP-UCB, along with lower bounds. On the practical side, it develops good-action identification algorithms that use threshold knowledge to find a good action faster than standard methods, as shown experimentally.
In this paper, we study the problem of Gaussian process (GP) bandits under relaxed optimization criteria stating that any function value above a certain threshold is "good enough". On the theoretical side, we study various {\em lenient regret} notions in which all near-optimal actions incur zero penalty, and provide upper bounds on the lenient regret for GP-UCB and an elimination algorithm, circumventing the usual $O(\sqrt{T})$ term (with time horizon $T$) resulting from zooming extremely close towards the function maximum. In addition, we complement these upper bounds with algorithm-independent lower bounds. On the practical side, we consider the problem of finding a single "good action" according to a known pre-specified threshold, and introduce several good-action identification algorithms that exploit knowledge of the threshold. We experimentally find that such algorithms can often find a good action faster than standard optimization-based approaches.