DSLGJun 14, 2021

Coresets for constrained k-median and k-means clustering in low dimensional Euclidean space

arXiv:2106.07319v11 citations
Originality Incremental advance
AI Analysis

This provides a unified approach for constrained clustering in streaming models, but it is incremental as it extends an existing technique to broader constraints.

The paper tackles the problem of constrained k-median and k-means clustering in low-dimensional Euclidean space by showing that a technique for fair k-means implies streaming algorithms for a wide range of constraints, with running time exponential in k.

We study (Euclidean) $k$-median and $k$-means with constraints in the streaming model. There have been recent efforts to design unified algorithms to solve constrained $k$-means problems without using knowledge of the specific constraint at hand aside from mild assumptions like the polynomial computability of feasibility under the constraint (compute if a clustering satisfies the constraint) or the presence of an efficient assignment oracle (given a set of centers, produce an optimal assignment of points to the centers which satisfies the constraint). These algorithms have a running time exponential in $k$, but can be applied to a wide range of constraints. We demonstrate that a technique proposed in 2019 for solving a specific constrained streaming $k$-means problem, namely fair $k$-means clustering, actually implies streaming algorithms for all these constraints. These work for low dimensional Euclidean space. [Note that there are more algorithms for streaming fair $k$-means today, in particular they exist for high dimensional spaces now as well.]

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes