LGFeb 22, 2022

Convergence of online $k$-means

Sanjoy Dasgupta, Gaurav Mahajan, Geelon So

arXiv:2202.10640v13.35 citations

Originality Incremental advance

AI Analysis

This provides theoretical guarantees for a widely used clustering method in streaming data scenarios, though it is incremental as it extends existing optimization techniques.

The paper tackles the problem of proving asymptotic convergence for online k-means algorithms on streaming data, showing that centers converge to stationary points of the k-means cost function by interpreting the algorithm as stochastic gradient descent with a stochastic learning rate schedule.

We prove asymptotic convergence for a general class of $k$-means algorithms performed over streaming data from a distribution: the centers asymptotically converge to the set of stationary points of the $k$-means cost function. To do so, we show that online $k$-means over a distribution can be interpreted as stochastic gradient descent with a stochastic learning rate schedule. Then, we prove convergence by extending techniques used in optimization literature to handle settings where center-specific learning rates may depend on the past trajectory of the centers.

View on arXiv PDF

Similar