Kernelized Locality-Sensitive Hashing for Semi-Supervised Agglomerative Clustering
This addresses efficiency issues in clustering for data scientists, though it is incremental as it builds on existing hashing and metric learning techniques.
The paper tackles the computational burden of large-scale agglomerative clustering by replacing exact distance calculations with Hamming distances from Kernelized Locality-Sensitive Hashing, drastically reducing computation time while achieving competitive precision and recall compared to K-Means.
Large scale agglomerative clustering is hindered by computational burdens. We propose a novel scheme where exact inter-instance distance calculation is replaced by the Hamming distance between Kernelized Locality-Sensitive Hashing (KLSH) hashed values. This results in a method that drastically decreases computation time. Additionally, we take advantage of certain labeled data points via distance metric learning to achieve a competitive precision and recall comparing to K-Means but in much less computation time.