LGMLSep 25, 2019

Online Semi-Supervised Concept Drift Detection with Density Estimation

arXiv:1909.11251v25 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of concept drift detection for streaming data applications where labels are scarce, though it is incremental as it builds on existing methods.

The paper tackles the problem of detecting real concept drift in streaming environments with limited labeled data, proposing a semi-supervised framework that uses density estimation of posterior probabilities and achieves comparable prediction performance to state-of-the-art methods.

Concept drift is formally defined as the change in joint distribution of a set of input variables X and a target variable y. The two types of drift that are extensively studied are real drift and virtual drift where the former is the change in posterior probabilities p(y|X) while the latter is the change in distribution of X without affecting the posterior probabilities. Many approaches on concept drift detection either assume full availability of data labels, y or handle only the virtual drift. In a streaming environment, the assumption of full availability of data labels, y is questioned. On the other hand, approaches that deal with virtual drift failed to address real drift. Rather than improving the state-of-the-art methods, this paper presents a semi-supervised framework to deal with the challenges above. The objective of the proposed framework is to learn from streaming environment with limited data labels, y and detect real drift concurrently. This paper proposes a novel concept drift detection method utilizing the densities of posterior probabilities in partially labeled streaming environments. Experimental results on both synthetic and realworld datasets show that our proposed semi-supervised framework enables the detection of concept drift in such environment while achieving comparable prediction performance to the state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes