DSSEJan 13, 2017

Streaming k-Means Clustering with Fast Queries

arXiv:1701.03826v23 citations
Originality Incremental advance
AI Analysis

This addresses the need for efficient real-time clustering in streaming applications, representing an incremental advance over existing methods.

The paper tackled the problem of enabling fast clustering queries on streaming data for k-means, achieving substantial improvements in query time while maintaining low approximation error and space usage.

We present methods for k-means clustering on a stream with a focus on providing fast responses to clustering queries. Compared to the current state-of-the-art, our methods provide substantial improvement in the query time for cluster centers while retaining the desirable properties of provably small approximation error and low space usage. Our algorithms rely on a novel idea of "coreset caching" that systematically reuses coresets (summaries of data) computed for recent queries in answering the current clustering query. We present both theoretical analysis and detailed experiments demonstrating their correctness and efficiency

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes