LGCVJan 8, 2021

Spending Your Winning Lottery Better After Drawing It

arXiv:2101.03255v313 citations
AI Analysis

This work provides an incremental improvement to the Lottery Ticket Hypothesis, benefiting researchers and practitioners working on neural network pruning and efficient model training.

This paper demonstrates that sparse sub-networks identified by the Lottery Ticket Hypothesis (LTH) do not need to strictly inherit training protocols from their dense counterparts. By introducing specific architectural and training recipe "tweaks," the authors achieve state-of-the-art LTH performance, with a significant gain of 1.05% - 4.93% for ResNet18 on CIFAR-100 over vanilla-LTH.

Lottery Ticket Hypothesis (LTH) suggests that a dense neural network contains a sparse sub-network that can match the performance of the original dense network when trained in isolation from scratch. Most works retrain the sparse sub-network with the same training protocols as its dense network, such as initialization, architecture blocks, and training recipes. However, till now it is unclear that whether these training protocols are optimal for sparse networks. In this paper, we demonstrate that it is unnecessary for spare retraining to strictly inherit those properties from the dense network. Instead, by plugging in purposeful "tweaks" of the sparse subnetwork architecture or its training recipe, its retraining can be significantly improved than the default, especially at high sparsity levels. Combining all our proposed "tweaks" can yield the new state-of-the-art performance of LTH, and these modifications can be easily adapted to other sparse training algorithms in general. Specifically, we have achieved a significant and consistent performance gain of1.05% - 4.93% for ResNet18 on CIFAR-100 over vanilla-LTH. Moreover, our methods are shown to generalize across datasets (CIFAR10, CIFAR100, TinyImageNet) and architectures (Vgg16, ResNet-18/ResNet-34, MobileNet). All codes will be publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes