LGJul 8, 2021

Accelerating Spherical k-Means

Erich Schubert, Andreas Lang, Gloria Feher

arXiv:2107.04074v15.511 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses efficiency improvements for clustering sparse, high-dimensional data like documents, but it is incremental as it adapts existing methods to a specific variant.

The paper tackled the problem of accelerating spherical k-means clustering by adapting Elkan and Hamerly acceleration techniques from Euclidean to Cosine similarities, resulting in a substantial speedup evaluated on real data.

Spherical k-means is a widely used clustering algorithm for sparse and high-dimensional data such as document vectors. While several improvements and accelerations have been introduced for the original k-means algorithm, not all easily translate to the spherical variant: Many acceleration techniques, such as the algorithms of Elkan and Hamerly, rely on the triangle inequality of Euclidean distances. However, spherical k-means uses Cosine similarities instead of distances for computational efficiency. In this paper, we incorporate the Elkan and Hamerly accelerations to the spherical k-means algorithm working directly with the Cosines instead of Euclidean distances to obtain a substantial speedup and evaluate these spherical accelerations on real data.

View on arXiv PDF Code

Similar