DCLGNov 8, 2021

TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches

arXiv:2111.04867v4129 citations
Originality Highly original
AI Analysis

This addresses a critical performance issue for machine learning practitioners training large models on distributed hardware, representing a novel method rather than an incremental improvement.

The paper tackles the bottleneck of communication collectives in multi-GPU training by developing TACCL, a tool that synthesizes efficient algorithms using communication sketches, resulting in up to 6.7x faster performance than NCCL and up to 2.3x speedup in end-to-end training for models like Transformer-XL and BERT.

Machine learning models are increasingly being trained across multiple GPUs and servers. In this setting, data is transferred between GPUs using communication collectives such as AlltoAll and AllReduce, which can become a significant bottleneck in training large models. Thus, it is important to use efficient algorithms for collective communication. We develop TACCL, a tool that enables algorithm designers to guide a synthesizer into automatically generating algorithms for a given hardware configuration and communication collective. TACCL uses a novel communication sketch abstraction to get crucial information from the designer to significantly reduce the search space and guide the synthesizer towards better algorithms. TACCL also uses a novel encoding of the problem that allows it to scale beyond single-node topologies. We use TACCL to synthesize algorithms for three collectives and two hardware topologies: DGX-2 and NDv2. We demonstrate that the algorithms synthesized by TACCL outperform the Nvidia Collective Communication Library (NCCL) by up to 6.7x. We also show that TACCL can speed up end-to-end training of Transformer-XL and BERT models by 11%--2.3x for different batch sizes.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes