DC LGMay 10, 2021

GSPMD: General and Scalable Parallelization for ML Computation Graphs

Yuanzhong Xu, HyoukJoong Lee, Dehao Chen, Blake Hechtman, Yanping Huang, Rahul Joshi, Maxim Krikun, Dmitry Lepikhin, Andy Ly, Marcello Maggioni, Ruoming Pang, Noam Shazeer

arXiv:2105.04663v227.6182 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of efficiently scaling large ML models for production use, though it is incremental as it builds on existing parallelization paradigms with a more general and user-friendly approach.

The paper tackles the problem of parallelizing machine learning computations across multiple devices by introducing GSPMD, an automatic compiler-based system that allows scaling from single-device programs with minimal user annotations, achieving 50% to 62% compute utilization on up to 2048 Cloud TPUv3 cores for models with up to one trillion parameters.

We present GSPMD, an automatic, compiler-based parallelization system for common machine learning computations. It allows users to write programs in the same way as for a single device, then give hints through a few annotations on how to distribute tensors, based on which GSPMD will parallelize the computation. Its representation of partitioning is simple yet general, allowing it to express different or mixed paradigms of parallelism on a wide variety of models. GSPMD infers the partitioning for every operator based on limited user annotations, making it convenient to scale existing single-device programs. It solves several technical challenges for production usage, allowing GSPMD to achieve 50% to 62% compute utilization on up to 2048 Cloud TPUv3 cores for models with up to one trillion parameters.

View on arXiv PDF

Similar