On-the-Fly Ensemble Pruning in Evolving Data Streams
This addresses the challenge of efficient ensemble management for data stream classification, particularly in imbalanced scenarios, representing an incremental advance in a largely unexplored area.
The paper tackles the problem of ensemble pruning in evolving data streams by proposing CCRP, an on-the-fly method that uses imbalance-aware fusion of class-wise rankings to select the best classifiers per class, resulting in performance on par or superior to original ensembles with 20% to 90% less memory consumption.
Ensemble pruning is the process of selecting a subset of componentclassifiers from an ensemble which performs at least as well as theoriginal ensemble while reducing storage and computational costs.Ensemble pruning in data streams is a largely unexplored area ofresearch. It requires analysis of ensemble components as they arerunning on the stream, and differentiation of useful classifiers fromredundant ones. We present CCRP, an on-the-fly ensemble prun-ing method for multi-class data stream classification empoweredby an imbalance-aware fusion of class-wise component rankings.CCRP aims that the resulting pruned ensemble contains the bestperforming classifier for each target class and hence, reduces the ef-fects of class imbalance. The conducted experiments on real-worldand synthetic data streams demonstrate that different types of en-sembles that integrate CCRP as their pruning scheme consistentlyyield on par or superior performance with 20% to 90% less averagememory consumption. Lastly, we validate the proposed pruningscheme by comparing our approach against pruning schemes basedon ensemble weights and basic rank fusion methods.