LG MLMar 27, 2020

Incorporating Expert Prior in Bayesian Optimisation via Space Warping

Anil Ramachandran, Sunil Gupta, Santu Rana, Cheng Li, Svetha Venkatesh

arXiv:2003.12250v113.245 citationsh-index: 60

Originality Incremental advance

AI Analysis

This addresses the issue of expensive function evaluations in optimization for researchers and practitioners, but it is incremental as it builds on existing Bayesian optimization methods.

The paper tackles the problem of Bayesian optimization's cold start phase in large search spaces by incorporating expert prior knowledge as a distribution to warp the search space, expanding high-probability regions and shrinking low-probability ones, and shows superiority over standard methods in benchmark functions and hyperparameter tuning for SVM and Random Forest.

Bayesian optimisation is a well-known sample-efficient method for the optimisation of expensive black-box functions. However when dealing with big search spaces the algorithm goes through several low function value regions before reaching the optimum of the function. Since the function evaluations are expensive in terms of both money and time, it may be desirable to alleviate this problem. One approach to subside this cold start phase is to use prior knowledge that can accelerate the optimisation. In its standard form, Bayesian optimisation assumes the likelihood of any point in the search space being the optimum is equal. Therefore any prior knowledge that can provide information about the optimum of the function would elevate the optimisation performance. In this paper, we represent the prior knowledge about the function optimum through a prior distribution. The prior distribution is then used to warp the search space in such a way that space gets expanded around the high probability region of function optimum and shrinks around low probability region of optimum. We incorporate this prior directly in function model (Gaussian process), by redefining the kernel matrix, which allows this method to work with any acquisition function, i.e. acquisition agnostic approach. We show the superiority of our method over standard Bayesian optimisation method through optimisation of several benchmark functions and hyperparameter tuning of two algorithms: Support Vector Machine (SVM) and Random forest.

View on arXiv PDF

Similar