Accelerate Support Vector Clustering via Spectrum-Preserving Data Compression
This addresses the scalability issue for SVC users, enabling fast and high-quality clustering on large-scale real-world datasets, representing a significant incremental improvement.
The paper tackles the problem of slow support vector clustering (SVC) by proposing a method that compresses data while preserving cluster properties, achieving speedups of 100X and 115X on Pendigits and USPS datasets with improved clustering quality.
This paper proposes a novel framework for accelerating support vector clustering. The proposed method first computes much smaller compressed data sets while preserving the key cluster properties of the original data sets based on a novel spectral data compression approach. Then, the resultant spectrally-compressed data sets are leveraged for the development of fast and high quality algorithm for support vector clustering. We conducted extensive experiments using real-world data sets and obtained very promising results. The proposed method allows us to achieve 100X and 115X speedups over the state of the art SVC method on the Pendigits and USPS data sets, respectively, while achieving even better clustering quality. To the best of our knowledge, this represents the first practical method for high-quality and fast SVC on large-scale real-world data sets