LGMLMay 15, 2020

Adaptive XGBoost for Evolving Data Streams

arXiv:2005.07353v153 citations
AI Analysis

This work addresses concept drift in data streams for applications like real-time analytics, but it is incremental as it adapts an existing method to a new setting.

The researchers tackled the problem of classifying evolving data streams with concept drift by adapting XGBoost, resulting in a method that updates ensembles on new data and shows competitive performance against existing incremental methods.

Boosting is an ensemble method that combines base models in a sequential manner to achieve high predictive accuracy. A popular learning algorithm based on this ensemble method is eXtreme Gradient Boosting (XGB). We present an adaptation of XGB for classification of evolving data streams. In this setting, new data arrives over time and the relationship between the class and the features may change in the process, thus exhibiting concept drift. The proposed method creates new members of the ensemble from mini-batches of data as new data becomes available. The maximum ensemble size is fixed, but learning does not stop when this size is reached because the ensemble is updated on new data to ensure consistency with the current concept. We also explore the use of concept drift detection to trigger a mechanism to update the ensemble. We test our method on real and synthetic data with concept drift and compare it against batch-incremental and instance-incremental classification methods for data streams.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes