What do you Mean? The Role of the Mean Function in Bayesian Optimisation
This work addresses the optimization efficiency problem for users of Bayesian optimization, but it is incremental as it focuses on refining an existing component rather than introducing a new paradigm.
The study investigated how the choice of mean function in Gaussian processes affects the convergence rate of Bayesian optimization, finding that using a constant mean function equal to the worst observed value consistently performed best on synthetic problems with dimensions ≥5, while more complex functions showed potential but no clear optimum on real-world tasks.
Bayesian optimisation is a popular approach for optimising expensive black-box functions. The next location to be evaluated is selected via maximising an acquisition function that balances exploitation and exploration. Gaussian processes, the surrogate models of choice in Bayesian optimisation, are often used with a constant prior mean function equal to the arithmetic mean of the observed function values. We show that the rate of convergence can depend sensitively on the choice of mean function. We empirically investigate 8 mean functions (constant functions equal to the arithmetic mean, minimum, median and maximum of the observed function evaluations, linear, quadratic polynomials, random forests and RBF networks), using 10 synthetic test problems and two real-world problems, and using the Expected Improvement and Upper Confidence Bound acquisition functions. We find that for design dimensions $\ge5$ using a constant mean function equal to the worst observed quality value is consistently the best choice on the synthetic problems considered. We argue that this worst-observed-quality function promotes exploitation leading to more rapid convergence. However, for the real-world tasks the more complex mean functions capable of modelling the fitness landscape may be effective, although there is no clearly optimum choice.