DS LGNov 28, 2022

A Faster $k$-means++ Algorithm

Jiehao Liang, Somdeb Sarkhel, Zhao Song, Chenbo Yin, Junze Yin, Danyang Zhuo

arXiv:2211.15118v26.65 citationsh-index: 19

Originality Incremental advance

AI Analysis

This work provides a faster algorithm for initializing k-means clustering, which is incremental but beneficial for practitioners dealing with large datasets.

The paper tackles the problem of accelerating the k-means++ initialization algorithm for k-means clustering, achieving a total running time of O~(nd + nk^2), which improves upon the previous state-of-the-art of O~(ndk^2).

$k$-means++ is an important algorithm for choosing initial cluster centers for the $k$-means clustering algorithm. In this work, we present a new algorithm that can solve the $k$-means++ problem with nearly optimal running time. Given $n$ data points in $\mathbb{R}^d$, the current state-of-the-art algorithm runs in $\widetilde{O}(k )$ iterations, and each iteration takes $\widetilde{O}(nd k)$ time. The overall running time is thus $\widetilde{O}(n d k^2)$. We propose a new algorithm \textsc{FastKmeans++} that only takes in $\widetilde{O}(nd + nk^2)$ time, in total.

View on arXiv PDF

Similar