Describing Nonstationary Data Streams in Frequency Domain
This work addresses concept drift for data stream processing methods, offering a tool for better drift detection and concept grouping, though it appears incremental as it builds on existing metafeature analysis.
The authors tackled the problem of concept drift in data streams by introducing the Frequency Filtering Metadescriptor, which characterizes streams using informative frequency components filtered by variance across batches, resulting in improved concept identification compared to state-of-the-art methods and a PCA baseline in experiments.
Concept drift is among the primary challenges faced by the data stream processing methods. The drift detection strategies, designed to counteract the negative consequences of such changes, often rely on analyzing the problem metafeatures. This work presents the Frequency Filtering Metadescriptor -- a tool for characterizing the data stream that searches for the informative frequency components visible in the sample's feature vector. The frequencies are filtered according to their variance across all available data batches. The presented solution is capable of generating a metadescription of the data stream, separating chunks into groups describing specific concepts on its basis, and visualizing the frequencies in the original spatial domain. The experimental analysis compared the proposed solution with two state-of-the-art strategies and with the PCA baseline in the post-hoc concept identification task. The research is followed by the identification of concepts in the real-world data streams. The generalization in the frequency domain adapted in the proposed solution allows to capture the complex feature dependencies as a reduced number of frequency components, while maintaining the semantic meaning of data.