LGMLJun 28, 2023

cuSLINK: Single-linkage Agglomerative Clustering on the GPU

arXiv:2306.16354v16 citationsh-index: 36
Originality Incremental advance
AI Analysis

This work addresses scalability issues in clustering for data mining and machine learning applications, such as HDBSCAN, social networks, NLP, and computer vision, though it is incremental as it builds on the existing SLINK algorithm.

The paper tackles the computational bottleneck of single-linkage agglomerative clustering by proposing cuSLINK, a GPU-based algorithm that reduces space complexity to O(Nk) and enables previously intractable applications, achieving state-of-the-art performance.

In this paper, we propose cuSLINK, a novel and state-of-the-art reformulation of the SLINK algorithm on the GPU which requires only $O(Nk)$ space and uses a parameter $k$ to trade off space and time. We also propose a set of novel and reusable building blocks that compose cuSLINK. These building blocks include highly optimized computational patterns for $k$-NN graph construction, spanning trees, and dendrogram cluster extraction. We show how we used our primitives to implement cuSLINK end-to-end on the GPU, further enabling a wide range of real-world data mining and machine learning applications that were once intractable. In addition to being a primary computational bottleneck in the popular HDBSCAN algorithm, the impact of our end-to-end cuSLINK algorithm spans a large range of important applications, including cluster analysis in social and computer networks, natural language processing, and computer vision. Users can obtain cuSLINK at https://docs.rapids.ai/api/cuml/latest/api/#agglomerative-clustering

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes