Fast Landmark Subspace Clustering
This work addresses efficiency bottlenecks in kernel-based clustering for applications like spectral and subspace clustering, offering a practical speed-up with minimal accuracy trade-offs.
The paper tackles the high time complexity of kernel methods in clustering by introducing a randomization-based approximation for a general class of kernels, resulting in an algorithm with O(KnD) complexity and bounded error. Experiments show that the proposed Fast Landmark Subspace (FLS) clustering algorithm accelerates subspace clustering with only marginal accuracy loss.
Kernel methods obtain superb performance in terms of accuracy for various machine learning tasks since they can effectively extract nonlinear relations. However, their time complexity can be rather large especially for clustering tasks. In this paper we define a general class of kernels that can be easily approximated by randomization. These kernels appear in various applications, in particular, traditional spectral clustering, landmark-based spectral clustering and landmark-based subspace clustering. We show that for $n$ data points from $K$ clusters with $D$ landmarks, the randomization procedure results in an algorithm of complexity $O(KnD)$. Furthermore, we bound the error between the original clustering scheme and its randomization. To illustrate the power of this framework, we propose a new fast landmark subspace (FLS) clustering algorithm. Experiments over synthetic and real datasets demonstrate the superior performance of FLS in accelerating subspace clustering with marginal sacrifice of accuracy.