LGOct 4, 2022

Log-Linear-Time Gaussian Processes Using Binary Tree Kernels

Oxford
arXiv:2210.01633v16 citationsh-index: 39
Originality Incremental advance
AI Analysis

This work addresses the scalability problem for practitioners using Gaussian processes in machine learning, offering a faster alternative with strong empirical results, though it is an incremental improvement over sparse methods.

The authors tackled the high computational cost of Gaussian process regression by introducing a binary tree kernel that reduces time complexity from O((n+m)n^2) to O((n+m)log(n+m)), achieving competitive or superior performance on test data likelihood and mean squared error compared to existing methods.

Gaussian processes (GPs) produce good probabilistic models of functions, but most GP kernels require $O((n+m)n^2)$ time, where $n$ is the number of data points and $m$ the number of predictive locations. We present a new kernel that allows for Gaussian process regression in $O((n+m)\log(n+m))$ time. Our "binary tree" kernel places all data points on the leaves of a binary tree, with the kernel depending only on the depth of the deepest common ancestor. We can store the resulting kernel matrix in $O(n)$ space in $O(n \log n)$ time, as a sum of sparse rank-one matrices, and approximately invert the kernel matrix in $O(n)$ time. Sparse GP methods also offer linear run time, but they predict less well than higher dimensional kernels. On a classic suite of regression tasks, we compare our kernel against Matérn, sparse, and sparse variational kernels. The binary tree GP assigns the highest likelihood to the test data on a plurality of datasets, usually achieves lower mean squared error than the sparse methods, and often ties or beats the Matérn GP. On large datasets, the binary tree GP is fastest, and much faster than a Matérn GP.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes