ML LGApr 8

The Theory and Practice of Highly Scalable Gaussian Process Regression with Nearest Neighbours

Robert Allison, Tomasz Maciazek, Anthony Stephenson

arXiv:2604.072676.5

AI Analysis

This provides a rigorous statistical foundation for scalable Gaussian process regression, addressing a bottleneck for practitioners dealing with massive datasets, though it is incremental as it builds on existing methods.

The paper tackled the scalability limitations of Gaussian process regression by analyzing nearest neighbor-based methods (NNGP/GPnn), deriving theoretical guarantees for predictive criteria like MSE, calibration, and negative log-likelihood, and proving universal consistency with minimax rates, explaining their robustness to hyper-parameter tuning.

Gaussian process ($GP$) regression is a widely used non-parametric modeling tool, but its cubic complexity in the training size limits its use on massive data sets. A practical remedy is to predict using only the nearest neighbours of each test point, as in Nearest Neighbour Gaussian Process ($NNGP$) regression for geospatial problems and the related scalable $GPnn$ method for more general machine-learning applications. Despite their strong empirical performance, the large-$n$ theory of $NNGP/GPnn$ remains incomplete. We develop a theoretical framework for $NNGP$ and $GPnn$ regression. Under mild regularity assumptions, we derive almost sure pointwise limits for three key predictive criteria: mean squared error ($MSE$), calibration coefficient ($CAL$), and negative log-likelihood ($NLL$). We then study the $L_2$-risk, prove universal consistency, and show that the risk attains Stone's minimax rate $n^{-2Î±/(2p+d)}$, where $Î±$ and $p$ capture regularity of the regression problem. We also prove uniform convergence of $MSE$ over compact hyper-parameter sets and show that its derivatives with respect to lengthscale, kernel scale, and noise variance vanish asymptotically, with explicit rates. This explains the observed robustness of $GPnn$ to hyper-parameter tuning. These results provide a rigorous statistical foundation for $NNGP/GPnn$ as a highly scalable and principled alternative to full $GP$ models.

View on arXiv PDF

Similar