LGFeb 19, 2022

Suitability of Different Metric Choices for Concept Drift Detection

arXiv:2202.09486v118 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of selecting effective metrics for unsupervised concept drift detection, which is crucial for maintaining model accuracy in dynamic data environments, but it is incremental as it builds on existing methods.

The paper analyzes how different metrics affect concept drift detection, comparing theoretical and empirical performance of various estimators and metrics, and proposes new metric choices validated through experiments.

The notion of concept drift refers to the phenomenon that the distribution, which is underlying the observed data, changes over time; as a consequence machine learning models may become inaccurate and need adjustment. Many unsupervised approaches for drift detection rely on measuring the discrepancy between the sample distributions of two time windows. This may be done directly, after some preprocessing (feature extraction, embedding into a latent space, etc.), or with respect to inferred features (mean, variance, conditional probabilities etc.). Most drift detection methods can be distinguished in what metric they use, how this metric is estimated, and how the decision threshold is found. In this paper, we analyze structural properties of the drift induced signals in the context of different metrics. We compare different types of estimators and metrics theoretically and empirically and investigate the relevance of the single metric components. In addition, we propose new choices and demonstrate their suitability in several experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes