LGAIMLApr 26, 2019

SWALP : Stochastic Weight Averaging in Low-Precision Training

arXiv:1904.11943v2105 citations
Originality Incremental advance
AI Analysis

This addresses scalability and efficiency problems for machine learning practitioners by enabling effective low-precision training, though it appears incremental as an extension of weight averaging techniques.

The paper tackles the challenge of maintaining performance in low-precision training by proposing SWALP, which averages low-precision SGD iterates with a modified learning rate schedule. The result shows that SWALP matches full-precision SGD performance with all numbers quantized to 8 bits and converges closer to optimal solutions in theoretical settings.

Low precision operations can provide scalability, memory savings, portability, and energy efficiency. This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule. SWALP is easy to implement and can match the performance of full-precision SGD even with all numbers quantized down to 8 bits, including the gradient accumulators. Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes