Can Evolutionary Sampling Improve Bagged Ensembles?
This work addresses the problem of improving ensemble accuracy for researchers and practitioners in machine learning, but it appears incremental as it builds on existing bagging methods with a novel sampling approach.
The paper introduces Evolutionary Sampling (ES), a new family of methods under the Perturb and Combine framework, which uses evolutionary algorithms to optimize sampling in feature spaces and training data for bagged ensembles, and empirically compares its performance against randomized sampling.
Perturb and Combine (P&C) group of methods generate multiple versions of the predictor by perturbing the training set or construction and then combining them into a single predictor (Breiman, 1996b). The motive is to improve the accuracy in unstable classification and regression methods. One of the most well known method in this group is Bagging. Arcing or Adaptive Resampling and Combining methods like AdaBoost are smarter variants of P&C methods. In this extended abstract, we lay the groundwork for a new family of methods under the P&C umbrella, known as Evolutionary Sampling (ES). We employ Evolutionary algorithms to suggest smarter sampling in both the feature space (sub-spaces) as well as training samples. We discuss multiple fitness functions to assess ensembles and empirically compare our performance against randomized sampling of training data and feature sub-spaces.