CVAIDec 2, 2022

Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?

arXiv:2212.01076v114 citationsh-index: 36Has Code
Originality Incremental advance
AI Analysis

This addresses the need for computationally efficient inference in neural networks, though it appears incremental as it builds on existing sparsity techniques.

The paper tackles the problem of training sparse neural networks by combining soft-thresholding and straight-through gradient estimation to smoothly increase sparsity without sharp weight discontinuities, achieving state-of-the-art results in accuracy/sparsity and accuracy/FLOPS trade-offs.

Turning the weights to zero when training a neural network helps in reducing the computational complexity at inference. To progressively increase the sparsity ratio in the network without causing sharp weight discontinuities during training, our work combines soft-thresholding and straight-through gradient estimation to update the raw, i.e. non-thresholded, version of zeroed weights. Our method, named ST-3 for straight-through/soft-thresholding/sparse-training, obtains SoA results, both in terms of accuracy/sparsity and accuracy/FLOPS trade-offs, when progressively increasing the sparsity ratio in a single training cycle. In particular, despite its simplicity, ST-3 favorably compares to the most recent methods, adopting differentiable formulations or bio-inspired neuroregeneration principles. This suggests that the key ingredients for effective sparsification primarily lie in the ability to give the weights the freedom to evolve smoothly across the zero state while progressively increasing the sparsity ratio. Source code and weights available at https://github.com/vanderschuea/stthree

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes