LGAIFeb 21, 2022

Survey on Large Scale Neural Network Training

arXiv:2202.10435v115 citations
Originality Synthesis-oriented
AI Analysis

It provides a comprehensive overview for researchers and practitioners dealing with scalability issues in neural network training, but it is incremental as it synthesizes existing literature.

This survey systematically reviews approaches to enable more efficient training of large-scale deep neural networks by addressing memory and computational constraints, summarizing techniques and comparing strategies across categories.

Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. Hence, many models do not fit one GPU device or can be trained using only a small per-GPU batch size. This survey provides a systematic overview of the approaches that enable more efficient DNNs training. We analyze techniques that save memory and make good use of computation and communication resources on architectures with a single or several GPUs. We summarize the main categories of strategies and compare strategies within and across categories. Along with approaches proposed in the literature, we discuss available implementations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes