LGMLJun 21, 2019

Sparse Spectrum Gaussian Process for Bayesian Optimization

arXiv:1906.08898v26 citations
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in Bayesian optimization for machine learning practitioners, offering an incremental improvement to existing sparse spectrum methods.

The authors tackled the problem of overconfident uncertainty estimates in sparse spectrum Gaussian process approximations for Bayesian optimization, proposing a regularized marginal likelihood method that improved convergence rates, with experiments showing considerable gains over baseline methods.

We propose a novel sparse spectrum approximation of Gaussian process (GP) tailored for Bayesian optimization. Whilst the current sparse spectrum methods provide desired approximations for regression problems, it is observed that this particular form of sparse approximations generates an overconfident GP, i.e. it produces less epistemic uncertainty than the original GP. Since the balance between predictive mean and the predictive variance is the key determinant to the success of Bayesian optimization, the current sparse spectrum methods are less suitable for it. We derive a new regularized marginal likelihood for finding the optimal frequencies to fix this over-confidence issue, particularly for Bayesian optimization. The regularizer trades off the accuracy in the model fitting with a targeted increase in the predictive variance of the resultant GP. Specifically, we use the entropy of the global maximum distribution from the posterior GP as the regularizer that needs to be maximized. Since this distribution cannot be calculated analytically, we first propose a Thompson sampling based approach and then a more efficient sequential Monte Carlo based approach to estimate it. Later, we also show that the Expected Improvement acquisition function can be used as a proxy for the maximum distribution, thus making the whole process further efficient. Experiments show considerable improvement to Bayesian optimization convergence rate over the vanilla sparse spectrum method and over a full GP when its covariance matrix is ill-conditioned due to the presence of a large number of observations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes