LGMLOct 19, 2024

Accelerating k-Means Clustering with Cover Trees

arXiv:2410.15117v11 citationsh-index: 4SISAP
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in clustering for data analysis applications, representing an incremental improvement over existing acceleration techniques.

The paper tackles the problem of accelerating k-means clustering by proposing a new algorithm based on cover trees, which reduces distance computations and outperforms previous k-d tree methods across a wider parameter range.

The k-means clustering algorithm is a popular algorithm that partitions data into k clusters. There are many improvements to accelerate the standard algorithm. Most current research employs upper and lower bounds on point-to-cluster distances and the triangle inequality to reduce the number of distance computations, with only arrays as underlying data structures. These approaches cannot exploit that nearby points are likely assigned to the same cluster. We propose a new k-means algorithm based on the cover tree index, that has relatively low overhead and performs well, for a wider parameter range, than previous approaches based on the k-d tree. By combining this with upper and lower bounds, as in state-of-the-art approaches, we obtain a hybrid algorithm that combines the benefits of tree aggregation and bounds-based filtering.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes