DBIROct 2, 2017

Clustering Stream Data by Exploring the Evolution of Density Mountain

arXiv:1710.00867v151 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of real-time clustering and evolution tracking in streaming data analysis, representing an incremental improvement over existing methods.

The paper tackles the problem of stream clustering by proposing EDMStream, which uses density mountains to update clusters efficiently and track their evolution, achieving 7-15x faster updates than competitors while maintaining comparable cluster quality.

Stream clustering is a fundamental problem in many streaming data analysis applications. Comparing to classical batch-mode clustering, there are two key challenges in stream clustering: (i) Given that input data are changing continuously, how to incrementally update clustering results efficiently? (ii) Given that clusters continuously evolve with the evolution of data, how to capture the cluster evolution activities? Unfortunately, most of existing stream clustering algorithms can neither update the cluster result in real time nor track the evolution of clusters. In this paper, we propose an stream clustering algorithm EDMStream by exploring the Evolution of Density Mountain. The density mountain is used to abstract the data distribution, the changes of which indicate data distribution evolution. We track the evolution of clusters by monitoring the changes of density mountains. We further provide efficient data structures and filtering schemes to ensure the update of density mountains in real time, which makes online clustering possible. The experimental results on synthetic and real datasets show that, comparing to the state-of-the-art stream clustering algorithms, e.g., D-Stream, DenStream, DBSTREAM and MR-Stream, our algorithm can response to a cluster update much faster (say 7-15x faster than the best of the competitors) and at the same time achieve comparable cluster quality. Furthermore, EDMStream can successfully capture the cluster evolution activities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes