To Boost or not to Boost: On the Limits of Boosted Neural Networks
This work addresses the problem of ensemble learning efficiency for neural networks, providing insights for researchers and practitioners in machine learning, though it is incremental as it builds on existing boosting and neural network studies.
The paper investigates the effectiveness of boosting for neural networks compared to decision trees, proving that while a sum of decision trees cannot be represented by a single tree with the same parameters, a sum of CNNs can be represented by a single CNN. Experiments on object recognition datasets show that a single neural network generalizes better than a boosted ensemble of smaller networks with the same total parameters, contrary to the well-known result for decision trees.
Boosting is a method for finding a highly accurate hypothesis by linearly combining many ``weak" hypotheses, each of which may be only moderately accurate. Thus, boosting is a method for learning an ensemble of classifiers. While boosting has been shown to be very effective for decision trees, its impact on neural networks has not been extensively studied. We prove one important difference between sums of decision trees compared to sums of convolutional neural networks (CNNs) which is that a sum of decision trees cannot be represented by a single decision tree with the same number of parameters while a sum of CNNs can be represented by a single CNN. Next, using standard object recognition datasets, we verify experimentally the well-known result that a boosted ensemble of decision trees usually generalizes much better on testing data than a single decision tree with the same number of parameters. In contrast, using the same datasets and boosting algorithms, our experiments show the opposite to be true when using neural networks (both CNNs and multilayer perceptrons (MLPs)). We find that a single neural network usually generalizes better than a boosted ensemble of smaller neural networks with the same total number of parameters.