LGAug 26, 2024

A Synthetic Benchmark to Explore Limitations of Localized Drift Detections

arXiv:2408.14687v1h-index: 14Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of detecting localized drift for practitioners in data stream mining, but it is incremental as it builds on existing drift detection techniques.

The paper tackles the problem of localized concept drift in data streams, where drift affects only specific subpopulations, and finds that common drift detection methods often fail to detect such localized changes, as demonstrated through experiments on a synthetic dataset.

Concept drift is a common phenomenon in data streams where the statistical properties of the target variable change over time. Traditionally, drift is assumed to occur globally, affecting the entire dataset uniformly. However, this assumption does not always hold true in real-world scenarios where only specific subpopulations within the data may experience drift. This paper explores the concept of localized drift and evaluates the performance of several drift detection techniques in identifying such localized changes. We introduce a synthetic dataset based on the Agrawal generator, where drift is induced in a randomly chosen subgroup. Our experiments demonstrate that commonly adopted drift detection methods may fail to detect drift when it is confined to a small subpopulation. We propose and test various drift detection approaches to quantify their effectiveness in this localized drift scenario. We make the source code for the generation of the synthetic benchmark available at https://github.com/fgiobergia/subgroup-agrawal-drift.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes