Fast and accurate conditioning for large-scale and online Gaussian process prediction problems

arXiv:2605.0257427.4
AI Analysis

For practitioners using Gaussian processes, this method provides a scalable alternative to exact GP prediction that maintains high accuracy, particularly beneficial when predicting many points in large connected regions or when prediction points are unknown in advance.

This work presents a method for fast and accurate Gaussian process prediction by conditioning on linear combinations of data, achieving machine-precision accuracy with O(T r^2) work for computing contrasts and O(1) online prediction cost after O(n r^2) precomputation, enabling efficient handling of large datasets and online prediction scenarios.

Gaussian Process (GP) models provide a flexible framework for prediction and uncertainty quantification. For most covariance functions, however, exact GP prediction with $n$ points scales as $\mathcal{O}(n^3)$, making it prohibitively expensive for large datasets or large numbers of prediction points. While nearest neighbor-based prediction can work well in certain settings, non-pathological circumstances (for example measurement noise) can severely restrict its efficiency. This work presents a complementary approach where one conditions on carefully designed linear combinations of data, which is particularly effective in the setting of predicting many values in large connected regions of the data domain. For kernel functions that are smooth away from the origin, conditioning on a small number $r$ of such data contrasts can be machine-precision accurate for the full exact conditional distributions. These contrasts cost $\mathcal{O}(T r^2)$ work to compute where $T$ is the cost of solving a linear system with the data covariance matrix, and so in many cases can be computed in linear or near-linear cost by exploiting rank structure in well-behaved covariance matrices. At the cost of $\mathcal{O}(nr^2)$ additional precomputation work, this approach can also provide predictions at arbitrary points of a designated region in $\mathcal{O}(1)$ online work, making it particularly attractive for problems where prediction points are not known in advance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes