LGJan 31, 2023

Training with Mixed-Precision Floating-Point Assignments

Stanford
arXiv:2301.13464v29 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses memory efficiency for deep learning practitioners, offering incremental improvements over prior low-precision training methods by avoiding divergence and providing better memory reduction.

The paper tackles the problem of memory-accuracy tradeoffs in training deep neural networks by generating precision assignments that map tensors to high or low precision levels, achieving over 2x memory reduction while preserving accuracy on image classification tasks like CIFAR-10, CIFAR-100, and ImageNet.

When training deep neural networks, keeping all tensors in high precision (e.g., 32-bit or even 16-bit floats) is often wasteful. However, keeping all tensors in low precision (e.g., 8-bit floats) can lead to unacceptable accuracy loss. Hence, it is important to use a precision assignment -- a mapping from all tensors (arising in training) to precision levels (high or low) -- that keeps most of the tensors in low precision and leads to sufficiently accurate models. We provide a technique that explores this memory-accuracy tradeoff by generating precision assignments for convolutional neural networks that (i) use less memory and (ii) lead to more accurate convolutional networks at the same time, compared to the precision assignments considered by prior work in low-precision floating-point training. We evaluate our technique on image classification tasks by training convolutional networks on CIFAR-10, CIFAR-100, and ImageNet. Our method typically provides > 2x memory reduction over a baseline precision assignment while preserving training accuracy, and gives further reductions by trading off accuracy. Compared to other baselines which sometimes cause training to diverge, our method provides similar or better memory reduction while avoiding divergence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes