MLLGMENov 15, 2022

Provably Reliable Large-Scale Sampling from Gaussian Processes

arXiv:2211.08036v32 citationsh-index: 23
AI Analysis

This enables efficient benchmarking of GP approximations, but is incremental as it focuses on improving data generation rather than core GP methods.

The paper tackles the problem of generating large-scale synthetic datasets from Gaussian processes (GPs) for evaluating approximate methods, achieving scalability to large n while providing provable guarantees that samples are indistinguishable from the desired GP.

When comparing approximate Gaussian process (GP) models, it can be helpful to be able to generate data from any GP. If we are interested in how approximate methods perform at scale, we may wish to generate very large synthetic datasets to evaluate them. Naïvely doing so would cost \(\mathcal{O}(n^3)\) flops and \(\mathcal{O}(n^2)\) memory to generate a size \(n\) sample. We demonstrate how to scale such data generation to large \(n\) whilst still providing guarantees that, with high probability, the sample is indistinguishable from a sample from the desired GP.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes