LGSep 15, 2022

Training Neural Networks in Single vs Double Precision

arXiv:2209.07219v16 citationsh-index: 41
Originality Synthesis-oriented
AI Analysis

This work addresses the computational efficiency and accuracy trade-offs for deep learning practitioners, but it is incremental as it evaluates existing methods on known optimization algorithms.

The study investigated whether single-precision floating-point arithmetic is justified in deep learning by comparing its optimization performance with double-precision for Conjugate Gradient and RMSprop algorithms on neural networks with up to 4 million parameters. Results showed that single-precision can match double-precision with superlinear convergence when line search finds improvements, but CG with double precision is superior for moderately nonlinear tasks and useful solutions.

The commitment to single-precision floating-point arithmetic is widespread in the deep learning community. To evaluate whether this commitment is justified, the influence of computing precision (single and double precision) on the optimization performance of the Conjugate Gradient (CG) method (a second-order optimization algorithm) and RMSprop (a first-order algorithm) has been investigated. Tests of neural networks with one to five fully connected hidden layers and moderate or strong nonlinearity with up to 4 million network parameters have been optimized for Mean Square Error (MSE). The training tasks have been set up so that their MSE minimum was known to be zero. Computing experiments have disclosed that single-precision can keep up (with superlinear convergence) with double-precision as long as line search finds an improvement. First-order methods such as RMSprop do not benefit from double precision. However, for moderately nonlinear tasks, CG is clearly superior. For strongly nonlinear tasks, both algorithm classes find only solutions fairly poor in terms of mean square error as related to the output variance. CG with double floating-point precision is superior whenever the solutions have the potential to be useful for the application goal.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes