Adaptive XGBoost for Evolving Data Streams
This work addresses concept drift in data streams for applications like real-time analytics, but it is incremental as it adapts an existing method to a new setting.
The researchers tackled the problem of classifying evolving data streams with concept drift by adapting XGBoost, resulting in a method that updates ensembles on new data and shows competitive performance against existing incremental methods.
Boosting is an ensemble method that combines base models in a sequential manner to achieve high predictive accuracy. A popular learning algorithm based on this ensemble method is eXtreme Gradient Boosting (XGB). We present an adaptation of XGB for classification of evolving data streams. In this setting, new data arrives over time and the relationship between the class and the features may change in the process, thus exhibiting concept drift. The proposed method creates new members of the ensemble from mini-batches of data as new data becomes available. The maximum ensemble size is fixed, but learning does not stop when this size is reached because the ensemble is updated on new data to ensure consistency with the current concept. We also explore the use of concept drift detection to trigger a mechanism to update the ensemble. We test our method on real and synthetic data with concept drift and compare it against batch-incremental and instance-incremental classification methods for data streams.