LGMSApr 1, 2021

Optimizer Fusion: Efficient Training with Better Locality and Parallelism

arXiv:2104.00237v13 citations
Originality Synthesis-oriented
AI Analysis

This addresses efficiency issues for ML practitioners by providing a plug-in technique to reduce training time without altering optimizer algorithms, though it is incremental as it builds on existing methods.

The paper tackles the problem of training time overhead in machine learning frameworks by proposing optimizer fusion, which reorders forward computation, gradient calculation, and parameter updating to improve locality and parallelism. The result is an up to 20% reduction in training time across various configurations.

Machine learning frameworks adopt iterative optimizers to train neural networks. Conventional eager execution separates the updating of trainable parameters from forward and backward computations. However, this approach introduces nontrivial training time overhead due to the lack of data locality and computation parallelism. In this work, we propose to fuse the optimizer with forward or backward computation to better leverage locality and parallelism during training. By reordering the forward computation, gradient calculation, and parameter updating, our proposed method improves the efficiency of iterative optimizers. Experimental results demonstrate that we can achieve an up to 20% training time reduction on various configurations. Since our methods do not alter the optimizer algorithm, they can be used as a general "plug-in" technique to the training process.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes