A Clustering-based Framework for Classifying Data Streams
This addresses challenges in data stream classification for applications requiring real-time adaptation, though it appears incremental as it builds on existing clustering and active learning techniques.
The paper tackles the problem of classifying non-stationary data streams without an initial label set, using a clustering-based framework with active label querying and sub-cluster handling for class overlap, resulting in statistically better or comparable performance than existing methods.
The non-stationary nature of data streams strongly challenges traditional machine learning techniques. Although some solutions have been proposed to extend traditional machine learning techniques for handling data streams, these approaches either require an initial label set or rely on specialized design parameters. The overlap among classes and the labeling of data streams constitute other major challenges for classifying data streams. In this paper, we proposed a clustering-based data stream classification framework to handle non-stationary data streams without utilizing an initial label set. A density-based stream clustering procedure is used to capture novel concepts with a dynamic threshold and an effective active label querying strategy is introduced to continuously learn the new concepts from the data streams. The sub-cluster structure of each cluster is explored to handle the overlap among classes. Experimental results and quantitative comparison studies reveal that the proposed method provides statistically better or comparable performance than the existing methods.