MLLGJan 31, 2018

Kernel Distillation for Fast Gaussian Processes Prediction

arXiv:1801.10273v2
Originality Incremental advance
AI Analysis

This work addresses a bottleneck for practitioners using GPs in real-world applications by making them faster and more scalable, though it is incremental as it builds on existing inducing points and sparse approximation methods.

The paper tackles the high computational cost of Gaussian processes (GPs) at inference time by introducing kernel distillation, a framework that approximates a trained GP model to reduce storage to O(m^2) for m inducing points and improve inference speed, with empirical results showing a better trade-off between prediction time and test performance compared to alternatives.

Gaussian processes (GPs) are flexible models that can capture complex structure in large-scale dataset due to their non-parametric nature. However, the usage of GPs in real-world application is limited due to their high computational cost at inference time. In this paper, we introduce a new framework, \textit{kernel distillation}, to approximate a fully trained teacher GP model with kernel matrix of size $n\times n$ for $n$ training points. We combine inducing points method with sparse low-rank approximation in the distillation procedure. The distilled student GP model only costs $O(m^2)$ storage for $m$ inducing points where $m \ll n$ and improves the inference time complexity. We demonstrate empirically that kernel distillation provides better trade-off between the prediction time and the test performance compared to the alternatives.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes