Conjugate Gradients for Kernel Machines
This work addresses scalability issues in kernel methods for large datasets, offering an incremental improvement over existing conjugate gradient techniques.
The authors tackled the cubic complexity of exact regularized least-squares regression in kernel machines by treating the prediction computation as a probabilistic inference problem, resulting in a method that improves kernel ridge regressor approximation over vanilla conjugate gradients and computes posterior variance and log marginal likelihood without extra cost.
Regularized least-squares (kernel-ridge / Gaussian process) regression is a fundamental algorithm of statistics and machine learning. Because generic algorithms for the exact solution have cubic complexity in the number of datapoints, large datasets require to resort to approximations. In this work, the computation of the least-squares prediction is itself treated as a probabilistic inference problem. We propose a structured Gaussian regression model on the kernel function that uses projections of the kernel matrix to obtain a low-rank approximation of the kernel and the matrix. A central result is an enhanced way to use the method of conjugate gradients for the specific setting of least-squares regression as encountered in machine learning. Our method improves the approximation of the kernel ridge regressor / Gaussian process posterior mean over vanilla conjugate gradients and, allows computation of the posterior variance and the log marginal likelihood (evidence) without further overhead.