Selecting a number of voters for a voting ensemble
This addresses a specific optimization issue in ensemble learning for practitioners, but it is incremental as it builds on existing voting ensemble techniques.
The paper tackles the problem of selecting the optimal number of voters in a voting ensemble, showing that any number can minimize error rates depending on the out-of-sample distribution of classifier errors, and proposes a method that reduces variance in error estimates compared to direct estimation.
For a voting ensemble that selects an odd-sized subset of the ensemble classifiers at random for each example, applies them to the example, and returns the majority vote, we show that any number of voters may minimize the error rate over an out-of-sample distribution. The optimal number of voters depends on the out-of-sample distribution of the number of classifiers in error. To select a number of voters to use, estimating that distribution then inferring error rates for numbers of voters gives lower-variance estimates than directly estimating those error rates.