DC LGJul 14, 2021

Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines

arXiv:2107.06925v522.5186 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses efficiency bottlenecks in large-scale neural network training for AI researchers and practitioners, representing an incremental improvement over existing pipeline parallelism methods.

The paper tackles the challenge of training large deep learning models at scale by proposing Chimera, a novel bidirectional pipeline parallelism scheme that reduces bubbles by up to 50% and improves training throughput by 1.16x-2.34x over state-of-the-art methods for a GPT-2 model with 1.3 billion parameters.

Training large deep learning models at scale is very challenging. This paper proposes Chimera, a novel pipeline parallelism scheme which combines bidirectional pipelines for efficiently training large-scale models. Chimera is a synchronous approach and therefore no loss of accuracy, which is more convergence-friendly than asynchronous approaches. Compared with the latest synchronous pipeline approach, Chimera reduces the number of bubbles by up to 50%; benefiting from the sophisticated scheduling of bidirectional pipelines, Chimera has a more balanced activation memory consumption. Evaluations are conducted on Transformer based language models. For a GPT-2 model with 1.3 billion parameters running on 2,048 GPU nodes of the Piz Daint supercomputer, Chimera improves the training throughput by 1.16x-2.34x over the state-of-the-art synchronous and asynchronous pipeline approaches.

View on arXiv PDF Code

Similar