DS SEJan 13, 2017

Streaming k-Means Clustering with Fast Queries

Yu Zhang, Kanat Tangwongsan, Srikanta Tirthapura

arXiv:1701.03826v23.33 citations

Originality Incremental advance

AI Analysis

This addresses the need for efficient real-time clustering in streaming applications, representing an incremental advance over existing methods.

The paper tackled the problem of enabling fast clustering queries on streaming data for k-means, achieving substantial improvements in query time while maintaining low approximation error and space usage.

We present methods for k-means clustering on a stream with a focus on providing fast responses to clustering queries. Compared to the current state-of-the-art, our methods provide substantial improvement in the query time for cluster centers while retaining the desirable properties of provably small approximation error and low space usage. Our algorithms rely on a novel idea of "coreset caching" that systematically reuses coresets (summaries of data) computed for recent queries in answering the current clustering query. We present both theoretical analysis and detailed experiments demonstrating their correctness and efficiency

View on arXiv PDF

Similar