Faster Kernel Ridge Regression Using Sketching and Preconditioning
This work addresses scalability issues in kernel-based machine learning for practitioners dealing with large-scale data, representing an incremental improvement over existing approximation methods.
The paper tackles the computational challenge of Kernel Ridge Regression (KRR) for large datasets by proposing a preconditioning technique based on random feature maps, which accelerates solving the linear system and is shown to be effective for up to one million training examples.
Kernel Ridge Regression (KRR) is a simple yet powerful technique for non-parametric regression whose computation amounts to solving a linear system. This system is usually dense and highly ill-conditioned. In addition, the dimensions of the matrix are the same as the number of data points, so direct methods are unrealistic for large-scale datasets. In this paper, we propose a preconditioning technique for accelerating the solution of the aforementioned linear system. The preconditioner is based on random feature maps, such as random Fourier features, which have recently emerged as a powerful technique for speeding up and scaling the training of kernel-based methods, such as kernel ridge regression, by resorting to approximations. However, random feature maps only provide crude approximations to the kernel function, so delivering state-of-the-art results by directly solving the approximated system requires the number of random features to be very large. We show that random feature maps can be much more effective in forming preconditioners, since under certain conditions a not-too-large number of random features is sufficient to yield an effective preconditioner. We empirically evaluate our method and show it is highly effective for datasets of up to one million training examples.