Optimal Order Simple Regret for Gaussian Process Bandits
This work provides a near-optimal solution for sequential optimization problems in fields like hyperparameter tuning or experimental design, though it is incremental as it refines existing bounds rather than introducing a new paradigm.
The paper addresses the gap between lower and upper bounds on simple regret in Gaussian Process bandits for sequential optimization of expensive functions, proving an order-optimal bound of $ ilde{\mathcal{O}}(\sqrt{\gamma_N/N})$ that is significantly tighter than existing results.
Consider the sequential optimization of a continuous, possibly non-convex, and expensive to evaluate objective function $f$. The problem can be cast as a Gaussian Process (GP) bandit where $f$ lives in a reproducing kernel Hilbert space (RKHS). The state of the art analysis of several learning algorithms shows a significant gap between the lower and upper bounds on the simple regret performance. When $N$ is the number of exploration trials and $γ_N$ is the maximal information gain, we prove an $\tilde{\mathcal{O}}(\sqrt{γ_N/N})$ bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds. We show that this bound is order optimal up to logarithmic factors for the cases where a lower bound on regret is known. To establish these results, we prove novel and sharp confidence intervals for GP models applicable to RKHS elements which may be of broader interest.