DSLGMLFeb 1, 2018

Nearly Optimal Dynamic $k$-Means Clustering for High-Dimensional Data

arXiv:1802.00459v22 citations
AI Analysis

This addresses the problem of efficient clustering in dynamic, high-dimensional data streams for applications like real-time analytics, and it is incremental as it builds on prior coreset methods but extends them to dynamic updates.

The paper tackles the dynamic $k$-means clustering problem for high-dimensional data streams with insertions and deletions, achieving a one-pass coreset construction algorithm using space nearly linear in $k$ and polynomial in dimension, which is the first such result in this setting.

We consider the $k$-means clustering problem in the dynamic streaming setting, where points from a discrete Euclidean space $\{1, 2, \ldots, Δ\}^d$ can be dynamically inserted to or deleted from the dataset. For this problem, we provide a one-pass coreset construction algorithm using space $\tilde{O}(k\cdot \mathrm{poly}(d, \logΔ))$, where $k$ is the target number of centers. To our knowledge, this is the first dynamic geometric data stream algorithm for $k$-means using space polynomial in dimension and nearly optimal (linear) in $k$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes