On Coreset Constructions for the Fuzzy $K$-Means Problem
This provides an efficient solution for soft clustering in large-scale or streaming data, though it is incremental as it builds on existing coreset methods for K-means.
The paper tackles the fuzzy K-means problem by presenting the first coresets with size linear in dimension, polynomial in clusters, and poly-logarithmic in points, enabling a (1+ε)-approximation and maintenance in streaming settings.
The fuzzy $K$-means problem is a popular generalization of the well-known $K$-means problem to soft clusterings. We present the first coresets for fuzzy $K$-means with size linear in the dimension, polynomial in the number of clusters, and poly-logarithmic in the number of points. We show that these coresets can be employed in the computation of a $(1+ε)$-approximation for fuzzy $K$-means, improving previously presented results. We further show that our coresets can be maintained in an insertion-only streaming setting, where data points arrive one-by-one.