LGAIDCApr 7, 2023

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

arXiv:2304.03589v157 citationsh-index: 42Has Code
Originality Synthesis-oriented
AI Analysis

It addresses the computational bottlenecks in deep learning for researchers and practitioners, but it is incremental as it synthesizes existing methods rather than introducing new ones.

This literature review tackles the problem of efficiently training large-scale deep learning models by providing a comprehensive taxonomy of acceleration techniques across data-centric, model-centric, optimization-centric, budgeted, and system-centric perspectives, aiming to reduce computational complexity and improve training efficiency.

The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech. The use of large-scale models trained on vast amounts of data holds immense promise for practical applications, enhancing industrial productivity and facilitating social development. With the increasing demands on computational capacity, though numerous studies have explored the efficient training, a comprehensive summarization on acceleration techniques of training deep learning models is still much anticipated. In this survey, we present a detailed review for training acceleration. We consider the fundamental update formulation and split its basic components into five main perspectives: (1) data-centric: including dataset regularization, data sampling, and data-centric curriculum learning techniques, which can significantly reduce the computational complexity of the data samples; (2) model-centric, including acceleration of basic modules, compression training, model initialization and model-centric curriculum learning techniques, which focus on accelerating the training via reducing the calculations on parameters; (3) optimization-centric, including the selection of learning rate, the employment of large batchsize, the designs of efficient objectives, and model average techniques, which pay attention to the training policy and improving the generality for the large-scale models; (4) budgeted training, including some distinctive acceleration methods on source-constrained situations; (5) system-centric, including some efficient open-source distributed libraries/systems which provide adequate hardware support for the implementation of acceleration algorithms. By presenting this comprehensive taxonomy, our survey presents a comprehensive review to understand the general mechanisms within each component and their joint interaction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes