RADIN: Souping on a Budget
This work addresses a computational bottleneck for practitioners using model soups, offering a more resource-efficient approach, though it is incremental as it builds on existing model soup techniques.
The paper tackles the computational challenge of selecting subsets for model soups by proposing RADIN, which approximates soup performance using averaged ensemble logits, validated theoretically and achieving up to 4% performance improvement on ImageNet at lower budgets compared to prior methods.
Model Soups, extending Stochastic Weights Averaging (SWA), combine models fine-tuned with different hyperparameters. Yet, their adoption is hindered by computational challenges due to subset selection issues. In this paper, we propose to speed up model soups by approximating soups performance using averaged ensemble logits performances. Theoretical insights validate the congruence between ensemble logits and weight averaging soups across any mixing ratios. Our Resource ADjusted soups craftINg (RADIN) procedure stands out by allowing flexible evaluation budgets, enabling users to adjust his budget of exploration adapted to his resources while increasing performance at lower budget compared to previous greedy approach (up to 4% on ImageNet).