Scaling Gaussian Process Regression with Derivatives
This addresses a computational problem for researchers and practitioners using Gaussian processes in applications like Bayesian optimization, though it is incremental as it builds on existing iterative methods.
The paper tackles the computational bottleneck of Gaussian process regression with derivatives, which scales prohibitively as O(n^3d^3), by proposing iterative solvers with fast matrix-vector multiplications and preconditioning, enabling Bayesian optimization to scale to high-dimensional problems and large budgets.
Gaussian processes (GPs) with derivatives are useful in many applications, including Bayesian optimization, implicit surface reconstruction, and terrain reconstruction. Fitting a GP to function values and derivatives at $n$ points in $d$ dimensions requires linear solves and log determinants with an ${n(d+1) \times n(d+1)}$ positive definite matrix -- leading to prohibitive $\mathcal{O}(n^3d^3)$ computations for standard direct methods. We propose iterative solvers using fast $\mathcal{O}(nd)$ matrix-vector multiplications (MVMs), together with pivoted Cholesky preconditioning that cuts the iterations to convergence by several orders of magnitude, allowing for fast kernel learning and prediction. Our approaches, together with dimensionality reduction, enables Bayesian optimization with derivatives to scale to high-dimensional problems and large evaluation budgets.