Bias-Free Scalable Gaussian Processes via Randomized Truncations
This addresses bias issues in scalable Gaussian Processes for practitioners, though it is incremental as it builds on existing methods.
The paper tackled biases in scalable Gaussian Process methods, specifically early truncated conjugate gradients (CG) and random Fourier features (RFF), finding that CG underfits and RFF overfits, and introduced randomized truncation estimators to eliminate bias, with CG showing meaningful performance gains.
Scalable Gaussian Process methods are computationally attractive, yet introduce modeling biases that require rigorous study. This paper analyzes two common techniques: early truncated conjugate gradients (CG) and random Fourier features (RFF). We find that both methods introduce a systematic bias on the learned hyperparameters: CG tends to underfit while RFF tends to overfit. We address these issues using randomized truncation estimators that eliminate bias in exchange for increased variance. In the case of RFF, we show that the bias-to-variance conversion is indeed a trade-off: the additional variance proves detrimental to optimization. However, in the case of CG, our unbiased learning procedure meaningfully outperforms its biased counterpart with minimal additional computation.