LGMLDec 4, 2019

A probability theoretic approach to drifting data in continuous time domains

arXiv:1912.01969v16 citations
Originality Highly original
AI Analysis

This work addresses the challenge of inconsistent formalizations of drift in machine learning, providing a foundational framework that could benefit researchers and practitioners dealing with time-varying data.

The authors tackled the problem of formalizing data drift in continuous time by developing a probability theoretical framework that unifies existing notions and enables a new characterization based on stochastic dependency between data and time. This led to the design of an efficient drift detection method and a technology to decompose data into drifting and non-drifting parts.

The notion of drift refers to the phenomenon that the distribution, which is underlying the observed data, changes over time. Albeit many attempts were made to deal with drift, formal notions of drift are application-dependent and formulated in various degrees of abstraction and mathematical coherence. In this contribution, we provide a probability theoretical framework, that allows a formalization of drift in continuous time, which subsumes popular notions of drift. In particular, it sheds some light on common practice such as change-point detection or machine learning methodologies in the presence of drift. It gives rise to a new characterization of drift in terms of stochastic dependency between data and time. This particularly intuitive formalization enables us to design a new, efficient drift detection method. Further, it induces a technology, to decompose observed data into a drifting and a non-drifting part.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes