Ensemble pruning via an integer programming approach with diversity constraints
This work addresses the challenge of improving predictive performance in ensemble learning for binary classification, though it is incremental as it builds on existing pruning methods with a new optimization technique.
The paper tackles the problem of selecting optimal subsets of classifiers in ensemble learning for binary classification by proposing an integer programming approach with diversity constraints, achieving competitive results compared to existing pruning methods on datasets with up to 60,000 data points.
Ensemble learning combines multiple classifiers in the hope of obtaining better predictive performance. Empirical studies have shown that ensemble pruning, that is, choosing an appropriate subset of the available classifiers, can lead to comparable or better predictions than using all classifiers. In this paper, we consider a binary classification problem and propose an integer programming (IP) approach for selecting optimal classifier subsets. We propose a flexible objective function to adapt to desired criteria of different datasets. We also propose constraints to ensure minimum diversity levels in the ensemble. Despite the general case of IP being NP-Hard, state-of-the-art solvers are able to quickly obtain good solutions for datasets with up to 60000 data points. Our approach yields competitive results when compared to some of the best and most used pruning methods in literature.