A Randomized Approach to Efficient Kernel Clustering
This addresses memory scalability issues for practitioners using kernel clustering on large datasets, though it is incremental as it builds on existing approximation methods.
The paper tackles the high memory requirement of kernel-based K-means clustering by proposing a one-pass randomized kernel approximation method, which experiments show is accurate and requires drastically less memory than standard kernel K-means and significantly less than Nystrom approximations.
Kernel-based K-means clustering has gained popularity due to its simplicity and the power of its implicit non-linear representation of the data. A dominant concern is the memory requirement since memory scales as the square of the number of data points. We provide a new analysis of a class of approximate kernel methods that have more modest memory requirements, and propose a specific one-pass randomized kernel approximation followed by standard K-means on the transformed data. The analysis and experiments suggest the method is accurate, while requiring drastically less memory than standard kernel K-means and significantly less memory than Nystrom based approximations.