Nearly Optimal Dynamic $k$-Means Clustering for High-Dimensional Data
This addresses the problem of efficient clustering in dynamic, high-dimensional data streams for applications like real-time analytics, and it is incremental as it builds on prior coreset methods but extends them to dynamic updates.
The paper tackles the dynamic $k$-means clustering problem for high-dimensional data streams with insertions and deletions, achieving a one-pass coreset construction algorithm using space nearly linear in $k$ and polynomial in dimension, which is the first such result in this setting.
We consider the $k$-means clustering problem in the dynamic streaming setting, where points from a discrete Euclidean space $\{1, 2, \ldots, Δ\}^d$ can be dynamically inserted to or deleted from the dataset. For this problem, we provide a one-pass coreset construction algorithm using space $\tilde{O}(k\cdot \mathrm{poly}(d, \logΔ))$, where $k$ is the target number of centers. To our knowledge, this is the first dynamic geometric data stream algorithm for $k$-means using space polynomial in dimension and nearly optimal (linear) in $k$.