Fully Dynamic Euclidean k-Means
This work addresses the problem of efficiently maintaining clustering solutions for dynamic data in Euclidean space, which is incremental by building on prior frameworks but introduces novel data structures to overcome limitations.
The paper tackles the Euclidean k-means clustering problem in a dynamic setting with point insertions and deletions, achieving a poly(1/ε)-approximation ratio with Õ(k^ε) update time and Õ(1) recourse, which is the first algorithm to break the O(k) update time barrier for constant approximation.
We consider the Euclidean $k$-means clustering problem in a dynamic setting, where we have to explicitly maintain a solution (a set of $k$ centers) $S \subseteq \mathbb{R}^d$ subject to point insertions/deletions in $\mathbb{R}^d$. We present a dynamic algorithm for Euclidean $k$-means with $\mathrm{poly}(1/ε)$-approximation ratio, $\tilde{O}(k^ε)$ update time, and $\tilde{O}(1)$ recourse, for any $ε\in (0,1)$, even when $d$ and $k$ are both part of the input. This is the first algorithm to achieve a constant ratio with $o(k)$ update time for this problem, whereas the previous $O(1)$-approximation runs in $\tilde O(k)$ update time [Bhattacharya, Costa, Farokhnejad; STOC'25]. In fact, previous algorithms cannot go beyond $O(k)$ update time precisely because they are designed for general metrics where an $Ω(k)$ lower bound is known. We break this $O(k)$ barrier by devising new fundamental data structures to utilize Euclidean properties: a structure that (implicitly) maintains a clustering subject to both center and data point updates, and a range query structure that can evaluate a mergeable function over any metric ball range given as a query. To obtain these structures, we devise the first consistent hashing scheme [Czumaj, Jiang, Krauthgamer, Vesel{ý}, Yang; FOCS'22] that achieves $\tilde O(n^ε)$ running time per point evaluation with competitive parameters. Our final algorithm exploits the framework of [Bhattacharya, Costa, Farokhnejad; STOC'25] for general metrics. The key change is to redesign several critical subroutines so that they reduce to our new Euclidean data structures, replacing the general-metric implementations that are unlikely to run efficiently even when Euclidean properties are provided.