LGOct 17, 2023

In defense of parameter sharing for model-compression

arXiv:2310.11611v18 citationsh-index: 32
Originality Highly original
AI Analysis

This work addresses the problem of efficient model compression for machine learning practitioners, advocating for a paradigm shift toward parameter-sharing methods.

This paper comprehensively compares randomized parameter-sharing (RPS) with pruning and smaller models for model compression, finding that RPS consistently outperforms or matches these alternatives across compression ranges, especially at high compression where it beats even highly informed pruning techniques like Lottery Ticket Rewinding. The authors also identify and provably fix stability and continuity issues in the state-of-the-art RPS method ROAST, proposing an improved version called STABLE-RPS.

When considering a model architecture, there are several ways to reduce its memory footprint. Historically, popular approaches included selecting smaller architectures and creating sparse networks through pruning. More recently, randomized parameter-sharing (RPS) methods have gained traction for model compression at start of training. In this paper, we comprehensively assess the trade-off between memory and accuracy across RPS, pruning techniques, and building smaller models. Our findings demonstrate that RPS, which is both data and model-agnostic, consistently outperforms/matches smaller models and all moderately informed pruning strategies, such as MAG, SNIP, SYNFLOW, and GRASP, across the entire compression range. This advantage becomes particularly pronounced in higher compression scenarios. Notably, even when compared to highly informed pruning techniques like Lottery Ticket Rewinding (LTR), RPS exhibits superior performance in high compression settings. This points out inherent capacity advantage that RPS enjoys over sparse models. Theoretically, we establish RPS as a superior technique in terms of memory-efficient representation when compared to pruning for linear models. This paper argues in favor of paradigm shift towards RPS based models. During our rigorous evaluation of RPS, we identified issues in the state-of-the-art RPS technique ROAST, specifically regarding stability (ROAST's sensitivity to initialization hyperparameters, often leading to divergence) and Pareto-continuity (ROAST's inability to recover the accuracy of the original model at zero compression). We provably address both of these issues. We refer to the modified RPS, which incorporates our improvements, as STABLE-RPS.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes