A Novel Surrogate-assisted Evolutionary Algorithm Applied to Partition-based Ensemble Learning
This work addresses the challenge of optimizing ensemble learning partitions efficiently for machine learning practitioners, representing an incremental improvement in surrogate-assisted evolutionary methods.
The authors tackled the expensive combinatorial optimization problem of finding optimal dataset partitions for ensemble learning by proposing a novel surrogate-assisted evolutionary algorithm, which achieved better performance than alternative approaches like Bayesian optimization using only several thousand evaluations on problems with up to 500 variables.
We propose a novel surrogate-assisted Evolutionary Algorithm for solving expensive combinatorial optimization problems. We integrate a surrogate model, which is used for fitness value estimation, into a state-of-the-art P3-like variant of the Gene-Pool Optimal Mixing Algorithm (GOMEA) and adapt the resulting algorithm for solving non-binary combinatorial problems. We test the proposed algorithm on an ensemble learning problem. Ensembling several models is a common Machine Learning technique to achieve better performance. We consider ensembles of several models trained on disjoint subsets of a dataset. Finding the best dataset partitioning is naturally a combinatorial non-binary optimization problem. Fitness function evaluations can be extremely expensive if complex models, such as Deep Neural Networks, are used as learners in an ensemble. Therefore, the number of fitness function evaluations is typically limited, necessitating expensive optimization techniques. In our experiments we use five classification datasets from the OpenML-CC18 benchmark and Support-vector Machines as learners in an ensemble. The proposed algorithm demonstrates better performance than alternative approaches, including Bayesian optimization algorithms. It manages to find better solutions using just several thousand fitness function evaluations for an ensemble learning problem with up to 500 variables.