Linear Mode Connectivity in Sparse Neural Networks
This work addresses the challenge of reducing data requirements for sparse network training in machine learning, though it is incremental as it builds on existing pruning and distillation methods.
The paper tackles the problem of training sparse neural networks more efficiently by using synthetic data from dataset distillation with Iterative Magnitude Pruning, finding that these subnetworks are more stable and achieve linear mode connectivity, matching traditional pruning performance with up to 150x less training data.
With the rise in interest of sparse neural networks, we study how neural network pruning with synthetic data leads to sparse networks with unique training properties. We find that distilled data, a synthetic summarization of the real data, paired with Iterative Magnitude Pruning (IMP) unveils a new class of sparse networks that are more stable to SGD noise on the real data, than either the dense model, or subnetworks found with real data in IMP. That is, synthetically chosen subnetworks often train to the same minima, or exhibit linear mode connectivity. We study this through linear interpolation, loss landscape visualizations, and measuring the diagonal of the hessian. While dataset distillation as a field is still young, we find that these properties lead to synthetic subnetworks matching the performance of traditional IMP with up to 150x less training points in settings where distilled data applies.