MLAILGNov 11, 2021

Hierarchical clustering by aggregating representatives in sub-minimum-spanning-trees

arXiv:2111.06968v14 citations
Originality Incremental advance
AI Analysis

This addresses the issue of poor robustness and reliability in hierarchical clustering for data analysis, though it appears incremental as it builds on existing clustering methods.

The paper tackles the problem of identifying representative points in hierarchical clustering by proposing a novel algorithm that scores reciprocal nearest data points in sub-minimum-spanning-trees, resulting in improved accuracy on UCI datasets and demonstrating O(nlogn) time and O(logn) space complexity for scalability.

One of the main challenges for hierarchical clustering is how to appropriately identify the representative points in the lower level of the cluster tree, which are going to be utilized as the roots in the higher level of the cluster tree for further aggregation. However, conventional hierarchical clustering approaches have adopted some simple tricks to select the "representative" points which might not be as representative as enough. Thus, the constructed cluster tree is less attractive in terms of its poor robustness and weak reliability. Aiming at this issue, we propose a novel hierarchical clustering algorithm, in which, while building the clustering dendrogram, we can effectively detect the representative point based on scoring the reciprocal nearest data points in each sub-minimum-spanning-tree. Extensive experiments on UCI datasets show that the proposed algorithm is more accurate than other benchmarks. Meanwhile, under our analysis, the proposed algorithm has O(nlogn) time-complexity and O(logn) space-complexity, indicating that it has the scalability in handling massive data with less time and storage consumptions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes