LGAIJun 18, 2024

Faithful Density-Peaks Clustering via Matrix Computations on MPI Parallelization System

arXiv:2406.12297v11 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the problem of scaling density peaks clustering for big data in both Euclidean and non-Euclidean spaces, offering a parallel solution that is incremental but with specific gains.

The paper tackles the scalability and Euclidean-space limitations of density peaks clustering by introducing a parallel method using vector-like distance matrices and an inverse leading-node-finding policy on an MPI system, achieving improved accuracy for large Euclidean data and enabling clustering of non-Euclidean data like in community detection.

Density peaks clustering (DP) has the ability of detecting clusters of arbitrary shape and clustering non-Euclidean space data, but its quadratic complexity in both computing and storage makes it difficult to scale for big data. Various approaches have been proposed in this regard, including MapReduce based distribution computing, multi-core parallelism, presentation transformation (e.g., kd-tree, Z-value), granular computing, and so forth. However, most of these existing methods face two limitations. One is their target datasets are mostly constrained to be in Euclidian space, the other is they emphasize only on local neighbors while ignoring global data distribution due to restriction to cut-off kernel when computing density. To address the two issues, we present a faithful and parallel DP method that makes use of two types of vector-like distance matrices and an inverse leading-node-finding policy. The method is implemented on a message passing interface (MPI) system. Extensive experiments showed that our method is capable of clustering non-Euclidean data such as in community detection, while outperforming the state-of-the-art counterpart methods in accuracy when clustering large Euclidean data. Our code is publicly available at https://github.com/alanxuji/FaithPDP.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes