MLLGJan 8, 2018

Online Cluster Validity Indices for Streaming Data

arXiv:1801.02937v113 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient cluster validation in streaming applications, representing an incremental extension of offline methods to online settings.

The paper tackled the problem of validating cluster quality in streaming data by developing online versions of Xie-Beni and Davies-Bouldin indices, showing that the incremental Xie-Beni index with forgetting factor outperformed others in tests.

Cluster analysis is used to explore structure in unlabeled data sets in a wide range of applications. An important part of cluster analysis is validating the quality of computationally obtained clusters. A large number of different internal indices have been developed for validation in the offline setting. However, this concept has not been extended to the online setting. A key challenge is to find an efficient incremental formulation of an index that can capture both cohesion and separation of the clusters over potentially infinite data streams. In this paper, we develop two online versions (with and without forgetting factors) of the Xie-Beni and Davies-Bouldin internal validity indices, and analyze their characteristics, using two streaming clustering algorithms (sk-means and online ellipsoidal clustering), and illustrate their use in monitoring evolving clusters in streaming data. We also show that incremental cluster validity indices are capable of sending a distress signal to online monitors when evolving clusters go awry. Our numerical examples indicate that the incremental Xie-Beni index with forgetting factor is superior to the other three indices tested.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes