LGMay 20, 2023

GraVAC: Adaptive Compression for Communication-Efficient Distributed DL Training

arXiv:2305.12201v211.59 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses communication bottlenecks in distributed training for deep learning practitioners, offering an adaptive solution that is incremental over existing compression methods.

The paper tackles the problem of communication overhead in distributed deep learning training by proposing GraVAC, a framework that dynamically adjusts gradient compression factors based on model progress and gradient information loss, achieving up to 6.67x reduction in end-to-end training time while maintaining or improving accuracy compared to dense SGD.

Distributed data-parallel (DDP) training improves overall application throughput as multiple devices train on a subset of data and aggregate updates to produce a globally shared model. The periodic synchronization at each iteration incurs considerable overhead, exacerbated by the increasing size and complexity of state-of-the-art neural networks. Although many gradient compression techniques propose to reduce communication cost, the ideal compression factor that leads to maximum speedup or minimum data exchange remains an open-ended problem since it varies with the quality of compression, model size and structure, hardware, network topology and bandwidth. We propose GraVAC, a framework to dynamically adjust compression factor throughout training by evaluating model progress and assessing gradient information loss associated with compression. GraVAC works in an online, black-box manner without any prior assumptions about a model or its hyperparameters, while achieving the same or better accuracy than dense SGD (i.e., no compression) in the same number of iterations/epochs. As opposed to using a static compression factor, GraVAC reduces end-to-end training time for ResNet101, VGG16 and LSTM by 4.32x, 1.95x and 6.67x respectively. Compared to other adaptive schemes, our framework provides 1.94x to 5.63x overall speedup.

View on arXiv PDF Code

Similar