Anthony A. Stephenson

h-index8

3papers

9citations

Novelty53%

AI Score45

Ranked #43,191 of 194,257 authors (top 22%)#540 in ML (top 16%)

3 Papers

7.4MLJun 26, 2023Code

Leveraging Locality and Robustness to Achieve Massively Scalable Gaussian Process Regression

Robert Allison, Anthony Stephenson, Samuel F et al.

The accurate predictions and principled uncertainty measures provided by GP regression incur O(n^3) cost which is prohibitive for modern-day large-scale applications. This has motivated extensive work on computationally efficient approximations. We introduce a new perspective by exploring robustness properties and limiting behaviour of GP nearest-neighbour (GPnn) prediction. We demonstrate through theory and simulation that as the data-size n increases, accuracy of estimated parameters and GP model assumptions become increasingly irrelevant to GPnn predictive accuracy. Consequently, it is sufficient to spend small amounts of work on parameter estimation in order to achieve high MSE accuracy, even in the presence of gross misspecification. In contrast, as n tends to infinity, uncertainty calibration and NLL are shown to remain sensitive to just one parameter, the additive noise-variance; but we show that this source of inaccuracy can be corrected for, thereby achieving both well-calibrated uncertainty measures and accurate predictions at remarkably low computational cost. We exhibit a very simple GPnn regression algorithm with stand-out performance compared to other state-of-the-art GP approximations as measured on large UCI datasets. It operates at a small fraction of those other methods' training costs, for example on a basic laptop taking about 30 seconds to train on a dataset of size n = 1.6 x 10^6.

5.9MLApr 8Code

The Theory and Practice of Highly Scalable Gaussian Process Regression with Nearest Neighbours

Robert Allison, Tomasz Maciazek, Anthony Stephenson

Gaussian process ($GP$) regression is a widely used non-parametric modeling tool, but its cubic complexity in the training size limits its use on massive data sets. A practical remedy is to predict using only the nearest neighbours of each test point, as in Nearest Neighbour Gaussian Process ($NNGP$) regression for geospatial problems and the related scalable $GPnn$ method for more general machine-learning applications. Despite their strong empirical performance, the large-$n$ theory of $NNGP/GPnn$ remains incomplete. We develop a theoretical framework for $NNGP$ and $GPnn$ regression. Under mild regularity assumptions, we derive almost sure pointwise limits for three key predictive criteria: mean squared error ($MSE$), calibration coefficient ($CAL$), and negative log-likelihood ($NLL$). We then study the $L_2$-risk, prove universal consistency, and show that the risk attains Stone's minimax rate $n^{-2Î±/(2p+d)}$, where $Î±$ and $p$ capture regularity of the regression problem. We also prove uniform convergence of $MSE$ over compact hyper-parameter sets and show that its derivatives with respect to lengthscale, kernel scale, and noise variance vanish asymptotically, with explicit rates. This explains the observed robustness of $GPnn$ to hyper-parameter tuning. These results provide a rigorous statistical foundation for $NNGP/GPnn$ as a highly scalable and principled alternative to full $GP$ models.

5.3MLNov 15, 2022Code

Provably Reliable Large-Scale Sampling from Gaussian Processes

Anthony Stephenson, Robert Allison, Edward Pyzer-Knapp

When comparing approximate Gaussian process (GP) models, it can be helpful to be able to generate data from any GP. If we are interested in how approximate methods perform at scale, we may wish to generate very large synthetic datasets to evaluate them. Naïvely doing so would cost $\mathcal{O}(n^3)$ flops and $\mathcal{O}(n^2)$ memory to generate a size $n$ sample. We demonstrate how to scale such data generation to large $n$ whilst still providing guarantees that, with high probability, the sample is indistinguishable from a sample from the desired GP.