MLLGApr 29, 2023

Representing Additive Gaussian Processes by Sparse Matrices

arXiv:2305.00324v11 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses scalability challenges in high-dimensional Bayesian optimization and GP modeling, offering incremental improvements to existing back-fitting algorithms.

The paper tackles the problem of efficiently computing posterior variance, log-likelihood, and gradients for additive Matérn Gaussian Processes, which was previously an open issue, by representing these functions with sparse matrices and vectors, achieving time complexities of O(n log n) for these computations and reducing Bayesian optimization acquisition function costs from O(n^2) to O(log n) or O(1).

Among generalized additive models, additive Matérn Gaussian Processes (GPs) are one of the most popular for scalable high-dimensional problems. Thanks to their additive structure and stochastic differential equation representation, back-fitting-based algorithms can reduce the time complexity of computing the posterior mean from $O(n^3)$ to $O(n\log n)$ time where $n$ is the data size. However, generalizing these algorithms to efficiently compute the posterior variance and maximum log-likelihood remains an open problem. In this study, we demonstrate that for Additive Matérn GPs, not only the posterior mean, but also the posterior variance, log-likelihood, and gradient of these three functions can be represented by formulas involving only sparse matrices and sparse vectors. We show how to use these sparse formulas to generalize back-fitting-based algorithms to efficiently compute the posterior mean, posterior variance, log-likelihood, and gradient of these three functions for additive GPs, all in $O(n \log n)$ time. We apply our algorithms to Bayesian optimization and propose efficient algorithms for posterior updates, hyperparameters learning, and computations of the acquisition function and its gradient in Bayesian optimization. Given the posterior, our algorithms significantly reduce the time complexity of computing the acquisition function and its gradient from $O(n^2)$ to $O(\log n)$ for general learning rate, and even to $O(1)$ for small learning rate.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes