LGMLJun 20, 2023

Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent

Cambridge
arXiv:2306.11589v329 citationsh-index: 55Has Code
Originality Highly original
AI Analysis

This provides a computationally efficient method for scaling Gaussian processes to large datasets, addressing a key limitation for practitioners in fields like sequential decision-making.

The paper tackles the computational bottleneck of solving linear systems in Gaussian processes by using stochastic gradient descent for approximate posterior sampling, achieving state-of-the-art performance on large-scale or ill-conditioned regression tasks and matching expensive baselines in uncertainty estimation for Bayesian optimization.

Gaussian processes are a powerful framework for quantifying uncertainty and for sequential decision-making but are limited by the requirement of solving linear systems. In general, this has a cubic cost in dataset size and is sensitive to conditioning. We explore stochastic gradient algorithms as a computationally efficient method of approximately solving these linear systems: we develop low-variance optimization objectives for sampling from the posterior and extend these to inducing points. Counterintuitively, stochastic gradient descent often produces accurate predictions, even in cases where it does not converge quickly to the optimum. We explain this through a spectral characterization of the implicit bias from non-convergence. We show that stochastic gradient descent produces predictive distributions close to the true posterior both in regions with sufficient data coverage, and in regions sufficiently far away from the data. Experimentally, stochastic gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks. Its uncertainty estimates match the performance of significantly more expensive baselines on a large-scale Bayesian optimization task.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes