DC LGApr 21, 2020

torchgpipe: On-the-fly Pipeline Parallelism for Training Giant Models

Chiheon Kim, Heungsub Lee, Myungryong Jeong, Woonhyuk Baek, Boogeon Yoon, Ildoo Kim, Sungbin Lim, Sungwoong Kim

arXiv:2004.09910v115.860 citationsHas Code

Originality Incremental advance

AI Analysis

This provides a ready-to-use solution for researchers and practitioners needing to scale model training in PyTorch, but it is incremental as it builds on existing GPipe concepts.

The authors tackled the problem of training giant models by developing torchgpipe, a PyTorch library for micro-batch pipeline parallelism with checkpointing, demonstrating its efficiency on architectures like AmoebaNet-D and U-Net.

We design and implement a ready-to-use library in PyTorch for performing micro-batch pipeline parallelism with checkpointing proposed by GPipe (Huang et al., 2019). In particular, we develop a set of design components to enable pipeline-parallel gradient computation in PyTorch's define-by-run and eager execution environment. We show that each component is necessary to fully benefit from pipeline parallelism in such environment, and demonstrate the efficiency of the library by applying it to various network architectures including AmoebaNet-D and U-Net. Our library is available at https://github.com/kakaobrain/torchgpipe .

View on arXiv PDF Code

Similar