IRDSJan 22, 2014

On Randomly Projected Hierarchical Clustering with Guarantees

arXiv:1401.5814v112 citations
Originality Incremental advance
AI Analysis

This work addresses scalability issues in hierarchical clustering for large datasets, offering significant speedups with theoretical guarantees, though it is incremental as it builds on existing random projection techniques.

The paper tackles the high runtime cost of hierarchical clustering algorithms by introducing fast methods based on random projections for single and average linkage clustering and the minimum spanning tree problem, achieving improvements of up to a factor of N/(log N)^2 over prior O(N^2) runtimes while maintaining clustering outcomes with high probability.

Hierarchical clustering (HC) algorithms are generally limited to small data instances due to their runtime costs. Here we mitigate this shortcoming and explore fast HC algorithms based on random projections for single (SLC) and average (ALC) linkage clustering as well as for the minimum spanning tree problem (MST). We present a thorough adaptive analysis of our algorithms that improve prior work from $O(N^2)$ by up to a factor of $N/(\log N)^2$ for a dataset of $N$ points in Euclidean space. The algorithms maintain, with arbitrary high probability, the outcome of hierarchical clustering as well as the worst-case running-time guarantees. We also present parameter-free instances of our algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes