Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks
This work addresses the challenge of optimizing ensemble performance for machine learning practitioners, though it is incremental as it builds on existing ensembling methods.
The paper tackles the problem of creating effective ensembles of deep neural networks by proposing and evaluating novel strategies like TreeNets and diversity-encouraging losses, achieving significantly higher oracle accuracies than classical ensembles.
Convolutional Neural Networks have achieved state-of-the-art performance on a wide range of tasks. Most benchmarks are led by ensembles of these powerful learners, but ensembling is typically treated as a post-hoc procedure implemented by averaging independently trained models with model variation induced by bagging or random initialization. In this paper, we rigorously treat ensembling as a first-class problem to explicitly address the question: what are the best strategies to create an ensemble? We first compare a large number of ensembling strategies, and then propose and evaluate novel strategies, such as parameter sharing (through a new family of models we call TreeNets) as well as training under ensemble-aware and diversity-encouraging losses. We demonstrate that TreeNets can improve ensemble performance and that diverse ensembles can be trained end-to-end under a unified loss, achieving significantly higher "oracle" accuracies than classical ensembles.