PFGE: Parsimonious Fast Geometric Ensembling of DNNs
This addresses the problem of memory inefficiency in ensemble methods for deep learning practitioners, offering an incremental improvement over existing fast geometric ensembling techniques.
The paper tackles the high computational and memory overhead of ensemble methods in deep learning by proposing PFGE, a parsimonious fast geometric ensembling method that achieves 5x memory efficiency on datasets like CIFAR and ImageNet without compromising generalization performance.
Ensemble methods are commonly used to enhance the generalization performance of machine learning models. However, they present a challenge in deep learning systems due to the high computational overhead required to train an ensemble of deep neural networks (DNNs). Recent advancements such as fast geometric ensembling (FGE) and snapshot ensembles have addressed this issue by training model ensembles in the same time as a single model. Nonetheless, these techniques still require additional memory for test-time inference compared to single-model-based methods. In this paper, we propose a new method called parsimonious FGE (PFGE), which employs a lightweight ensemble of higher-performing DNNs generated through successive stochastic weight averaging procedures. Our experimental results on CIFAR-{10,100} and ImageNet datasets across various modern DNN architectures demonstrate that PFGE achieves 5x memory efficiency compared to previous methods, without compromising on generalization performance. For those interested, our code is available at https://github.com/ZJLAB-AMMI/PFGE.