LG AI DCFeb 1, 2023

TAPAS: Fast and Automatic Derivation of Tensor Parallel Strategies for Large Neural Networks

Ziji Shi, Le Jiang, Ang Wang, Jie Zhang, Chencan Wu, Yong Li, Xiaokui Xiao, Wei Lin, Jialin Li

arXiv:2302.00247v27.74 citationsh-index: 69

Originality Incremental advance

AI Analysis

This work addresses the scalability problem in distributed training for AI researchers and engineers, offering an incremental improvement over existing auto-parallel systems.

The paper tackles the challenge of automatically determining optimal tensor parallel strategies for large neural networks by proposing TAPAS, a framework that reduces search complexity through a divide-and-conquer approach, achieving up to 160x faster search speeds compared to state-of-the-art methods while maintaining competitive performance.

Tensor parallelism is an essential technique for distributed training of large neural networks. However, automatically determining an optimal tensor parallel strategy is challenging due to the gigantic search space, which grows exponentially with model size and tensor dimension. This prohibits the adoption of auto-parallel systems on larger models. We observe that neural networks usually contain repeated substructures, and build an automatic parallelism framework named TAPAS that eliminates redundant search efforts. TAPAS employs a divide-and-conquer approach that efficiently folds the search space by identifying those unique substructures. As a result, it runs at sub-linear complexity concerning the model size, making it a scalable solution for training large-scale networks. Our evaluations demonstrate that TAPAS outperforms the state-of-the-art automatic parallelism frameworks by up to $160\times$ in search speed on a wide range of models, and the performance of derived strategies is competitive or even better compared with the expert-engineered Megatron-LM library.

View on arXiv PDF

Similar