DCAILGOCMLMar 30, 2024

Communication Efficient Distributed Training with Distributed Lion

arXiv:2404.00438v118 citationsh-index: 16NIPS
Originality Highly original
AI Analysis

This addresses communication bottlenecks for training large AI models, offering a practical improvement over existing methods like deep gradient compression.

The paper tackles the high communication cost in distributed training by introducing Distributed Lion, which communicates binary or lower-precision vectors to reduce bandwidth while achieving performance comparable to standard Lion or AdamW on vision and language tasks.

The Lion optimizer has been a promising competitor with the AdamW for training large AI models, with advantages on memory, computation, and sample efficiency. In this paper, we introduce Distributed Lion, an innovative adaptation of Lion for distributed training environments. Leveraging the sign operator in Lion, our Distributed Lion only requires communicating binary or lower-precision vectors between workers to the center server, significantly reducing the communication cost. Our theoretical analysis confirms Distributed Lion's convergence properties. Empirical results demonstrate its robustness across a range of tasks, worker counts, and batch sizes, on both vision and language problems. Notably, Distributed Lion attains comparable performance to standard Lion or AdamW optimizers applied on aggregated gradients, but with significantly reduced communication bandwidth. This feature is particularly advantageous for training large models. In addition, we also demonstrate that Distributed Lion presents a more favorable performance-bandwidth balance compared to existing efficient distributed methods such as deep gradient compression and ternary gradients.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes