ML LG NA OCJan 23, 2023

On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality

Lu Xia, Michiel E. Hochstenbach, Stefano Massei

arXiv:2301.09511v27.46 citationsh-index: 23

Originality Incremental advance

AI Analysis

This addresses convergence issues in low-precision computation for machine learning practitioners, offering an incremental improvement in rounding strategies.

The paper tackles the problem of rounding errors in low-precision neural network training by analyzing gradient descent under the Polyak-Lojasiewicz inequality, showing that biased stochastic rounding can eliminate vanishing gradients and provide a stricter convergence bound than unbiased rounding.

When training neural networks with low-precision computation, rounding errors often cause stagnation or are detrimental to the convergence of the optimizers; in this paper we study the influence of rounding errors on the convergence of the gradient descent method for problems satisfying the Polyak-\Lojasiewicz inequality. Within this context, we show that, in contrast, biased stochastic rounding errors may be beneficial since choosing a proper rounding strategy eliminates the vanishing gradient problem and forces the rounding bias in a descent direction. Furthermore, we obtain a bound on the convergence rate that is stricter than the one achieved by unbiased stochastic rounding. The theoretical analysis is validated by comparing the performances of various rounding strategies when optimizing several examples using low-precision fixed-point number formats.

View on arXiv PDF

Similar