MLLGNov 27, 2017

One-Shot Coresets: The Case of k-Clustering

arXiv:1711.09649v341 citations
Originality Highly original
AI Analysis

This addresses the need for scalable and flexible data summarization in clustering for practitioners, offering a novel solution beyond problem-dependent methods.

The paper tackles the problem of constructing small data summaries that work for a wide range of k-clustering objectives simultaneously, proposing an efficient algorithm that achieves strong theoretical guarantees.

Scaling clustering algorithms to massive data sets is a challenging task. Recently, several successful approaches based on data summarization methods, such as coresets and sketches, were proposed. While these techniques provide provably good and small summaries, they are inherently problem dependent - the practitioner has to commit to a fixed clustering objective before even exploring the data. However, can one construct small data summaries for a wide range of clustering problems simultaneously? In this work, we affirmatively answer this question by proposing an efficient algorithm that constructs such one-shot summaries for k-clustering problems while retaining strong theoretical guarantees.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes