When are Iterative Gaussian Processes Reliably Accurate?
This work addresses reliability issues in scalable Gaussian process inference for practitioners, but it is incremental as it builds on existing iterative methods.
The paper tackled the problem of numerical instabilities and poor test likelihoods in iterative Gaussian process inference methods, and found that using a small conjugate gradient tolerance and large root decomposition size, along with L-BFGS-B optimization, improves reliability and convergence.
While recent work on conjugate gradient methods and Lanczos decompositions have achieved scalable Gaussian process inference with highly accurate point predictions, in several implementations these iterative methods appear to struggle with numerical instabilities in learning kernel hyperparameters, and poor test likelihoods. By investigating CG tolerance, preconditioner rank, and Lanczos decomposition rank, we provide a particularly simple prescription to correct these issues: we recommend that one should use a small CG tolerance ($ε\leq 0.01$) and a large root decomposition size ($r \geq 5000$). Moreover, we show that L-BFGS-B is a compelling optimizer for Iterative GPs, achieving convergence with fewer gradient updates.