Stochastic Gradient Descent for Gaussian Processes Done Right
This addresses the scalability problem for Gaussian process regression users, offering an efficient alternative to established methods, though it appears incremental as it builds on known optimization insights.
The paper tackles the computational challenge of solving large linear systems in Gaussian process regression by introducing a stochastic dual descent algorithm, showing it is highly competitive with existing methods like preconditioned conjugate gradients and variational approximations, and achieving performance on par with state-of-the-art graph neural networks in molecular binding affinity prediction.
As is well known, both sampling from the posterior and computing the mean of the posterior in Gaussian process regression reduces to solving a large linear system of equations. We study the use of stochastic gradient descent for solving this linear system, and show that when \emph{done right} -- by which we mean using specific insights from the optimisation and kernel communities -- stochastic gradient descent is highly effective. To that end, we introduce a particularly simple \emph{stochastic dual descent} algorithm, explain its design in an intuitive manner and illustrate the design choices through a series of ablation studies. Further experiments demonstrate that our new method is highly competitive. In particular, our evaluations on the UCI regression tasks and on Bayesian optimisation set our approach apart from preconditioned conjugate gradients and variational Gaussian process approximations. Moreover, our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.