LG MLJun 24, 2020

Ensuring Learning Guarantees on Concept Drift Detection with Statistical Learning Theory

arXiv:2006.14079v11.2

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of unreliable drift detection for researchers in data stream modeling, though it is incremental as it builds on existing theory to improve evaluation.

The paper tackles the lack of learning guarantees in concept drift detection algorithms by using Statistical Learning Theory to formalize requirements for probabilistic bounds, ensuring drifts reflect actual data changes rather than chance, and it assesses existing algorithms under this methodology.

Concept Drift (CD) detection intends to continuously identify changes in data stream behaviors, supporting researchers in the study and modeling of real-world phenomena. Motivated by the lack of learning guarantees in current CD algorithms, we decided to take advantage of the Statistical Learning Theory (SLT) to formalize the necessary requirements to ensure probabilistic learning bounds, so drifts would refer to actual changes in data rather than by chance. As discussed along this paper, a set of mathematical assumptions must be held in order to rely on SLT bounds, which are especially controversial in CD scenarios. Based on this issue, we propose a methodology to address those assumptions in CD scenarios and therefore ensure learning guarantees. Complementary, we assessed a set of relevant and known CD algorithms from the literature in light of our methodology. As main contribution, we expect this work to support researchers while designing and evaluating CD algorithms on different domains.

View on arXiv PDF

Similar