LGAIJun 29, 2023

Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging

arXiv:2306.16788v322 citationsh-index: 29
Originality Incremental advance
AI Analysis

It addresses the problem of enhancing sparse model performance for machine learning practitioners, offering an incremental improvement over existing pruning techniques.

This work tackles the challenge of combining sparsity and parameter averaging in neural networks by introducing Sparse Model Soups (SMS), a method that merges sparse models from iterative pruning cycles, resulting in improved generalization and out-of-distribution performance over individual models.

Neural networks can be significantly compressed by pruning, yielding sparse models with reduced storage and computational demands while preserving predictive performance. Model soups (Wortsman et al., 2022) enhance generalization and out-of-distribution (OOD) performance by averaging the parameters of multiple models into a single one, without increasing inference time. However, achieving both sparsity and parameter averaging is challenging as averaging arbitrary sparse models reduces the overall sparsity due to differing sparse connectivities. This work addresses these challenges by demonstrating that exploring a single retraining phase of Iterative Magnitude Pruning (IMP) with varied hyperparameter configurations such as batch ordering or weight decay yields models suitable for averaging, sharing identical sparse connectivity by design. Averaging these models significantly enhances generalization and OOD performance over their individual counterparts. Building on this, we introduce Sparse Model Soups (SMS), a novel method for merging sparse models by initiating each prune-retrain cycle with the averaged model from the previous phase. SMS preserves sparsity, exploits sparse network benefits, is modular and fully parallelizable, and substantially improves IMP's performance. We further demonstrate that SMS can be adapted to enhance state-of-the-art pruning-during-training approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes