Theoretical Analysis of Bayesian Optimisation with Unknown Gaussian Process Hyper-Parameters
This work addresses a crucial but understudied issue in Bayesian optimization, offering theoretical insights that could improve practical applications, though it appears incremental as it builds on existing methods with new analysis.
The paper tackles the problem of setting hyper-parameters in Bayesian optimization, which lacks theoretical analysis, by deriving a cumulative regret bound for Gaussian processes with unknown kernel hyper-parameters in a stochastic setting, providing design guidelines for hyper-parameter estimation methods.
Bayesian optimisation has gained great popularity as a tool for optimising the parameters of machine learning algorithms and models. Somewhat ironically, setting up the hyper-parameters of Bayesian optimisation methods is notoriously hard. While reasonable practical solutions have been advanced, they can often fail to find the best optima. Surprisingly, there is little theoretical analysis of this crucial problem in the literature. To address this, we derive a cumulative regret bound for Bayesian optimisation with Gaussian processes and unknown kernel hyper-parameters in the stochastic setting. The bound, which applies to the expected improvement acquisition function and sub-Gaussian observation noise, provides us with guidelines on how to design hyper-parameter estimation methods. A simple simulation demonstrates the importance of following these guidelines.