ML LG COJun 7, 2019

On the Current State of Research in Explaining Ensemble Performance Using Margins

arXiv:1906.03123v11 citations

Originality Synthesis-oriented

AI Analysis

This work addresses a foundational theoretical problem in machine learning for researchers, but it appears incremental as it tests established assumptions rather than introducing new paradigms.

The paper investigates the validity of existing theoretical explanations for ensemble performance, specifically testing whether larger margins lead to lower generalization error as claimed in prior research, through empirical experiments on real and simulated datasets.

Empirical evidence shows that ensembles, such as bagging, boosting, random and rotation forests, generally perform better in terms of their generalization error than individual classifiers. To explain this performance, Schapire et al. (1998) developed an upper bound on the generalization error of an ensemble based on the margins of the training data, from which it was concluded that larger margins should lead to lower generalization error, everything else being equal. Many other researchers have backed this assumption and presented tighter bounds on the generalization error based on either the margins or functions of the margins. For instance, Shen and Li (2010) provide evidence suggesting that the generalization error of a voting classifier might be reduced by increasing the mean and decreasing the variance of the margins. In this article we propose several techniques and empirically test whether the current state of research in explaining ensemble performance holds. We evaluate the proposed methods through experiments with real and simulated data sets.

View on arXiv PDF

Similar