LGAIITNAAug 10, 2025

SGD Convergence under Stepsize Shrinkage in Low-Precision Training

arXiv:2508.07142v2
Originality Synthesis-oriented
AI Analysis

This work addresses convergence issues in low-precision training for deep learning, which is crucial for reducing computational costs, but it is incremental as it extends existing SGD theory to a gradient shrinkage model.

The paper tackles the problem of SGD convergence in low-precision training, where gradient quantization causes shrinkage, and shows that this leads to slower convergence and higher steady-state error, with rates dependent on the minimum shrinkage factor.

Low-precision training has become crucial for reducing the computational and memory costs of large-scale deep learning. However, quantizing gradients introduces magnitude shrinkage, which can change how stochastic gradient descent (SGD) converges. In this study, we explore SGD convergence under a gradient shrinkage model, where each stochastic gradient is scaled by a factor \( q_k \in (0,1] \). We show that this shrinkage affect the usual stepsize \( μ_k \) with an effective stepsize \( μ_k q_k \), slowing convergence when \( q_{\min} < 1 \). With typical smoothness and bounded-variance assumptions, we prove that low-precision SGD still converges, but at a slower pace set by \( q_{\min} \), and with a higher steady error level due to quantization effects. We analyze theoretically how lower numerical precision slows training by treating it as gradient shrinkage within the standard SGD convergence setup.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes