MLLGAPCOMENov 7, 2024

Compactly-supported nonstationary kernels for computing exact Gaussian processes on big data

arXiv:2411.05869v31 citationsh-index: 24Environmetrics
Originality Incremental advance
AI Analysis

This work addresses scalability and flexibility issues in GPs for researchers and practitioners in machine learning and Earth sciences, though it is incremental as it builds on existing GP frameworks.

The authors tackled the limitations of Gaussian processes (GPs) by developing a compactly-supported nonstationary kernel that enables exact inference on large datasets, demonstrating performance gains over existing methods and applying it to over one million temperature measurements with improved accuracy.

The Gaussian process (GP) is a widely used probabilistic machine learning method with implicit uncertainty characterization for stochastic function approximation, stochastic modeling, and analyzing real-world measurements of nonlinear processes. Traditional implementations of GPs involve stationary kernels (also termed covariance functions) that limit their flexibility, and exact methods for inference that prevent application to data sets with more than about ten thousand points. Modern approaches to address stationarity assumptions generally fail to accommodate large data sets, while all attempts to address scalability focus on approximating the Gaussian likelihood, which can involve subjectivity and lead to inaccuracies. In this work, we explicitly derive an alternative kernel that can discover and encode both sparsity and nonstationarity. We embed the kernel within a fully Bayesian GP model and leverage high-performance computing resources to enable the analysis of massive data sets. We demonstrate the favorable performance of our novel kernel relative to existing exact and approximate GP methods across a variety of synthetic data examples. Furthermore, we conduct space-time prediction based on more than one million measurements of daily maximum temperature and verify that our results outperform state-of-the-art methods in the Earth sciences. More broadly, having access to exact GPs that use ultra-scalable, sparsity-discovering, nonstationary kernels allows GP methods to truly compete with a wide variety of machine learning methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes