LGAISep 26, 2025

Partial Parameter Updates for Efficient Distributed Training

arXiv:2509.22418v11 citationsh-index: 54
Originality Incremental advance
AI Analysis

This addresses communication bottlenecks in distributed training for large-scale models, offering incremental improvements over existing low-communication approaches.

The paper tackles the problem of high communication costs in distributed training by introducing a method that restricts backpropagation to update only a subset of parameters locally, reducing memory usage and FLOPs. Experiments on a 1.3B-parameter language model across 32 nodes show it matches perplexity of prior methods while cutting training FLOPs and peak memory.

We introduce a memory- and compute-efficient method for low-communication distributed training. Existing methods reduce communication by performing multiple local updates between infrequent global synchronizations. We demonstrate that their efficiency can be significantly improved by restricting backpropagation: instead of updating all the parameters, each node updates only a fixed subset while keeping the remainder frozen during local steps. This constraint substantially reduces peak memory usage and training FLOPs, while a full forward pass over all parameters eliminates the need for cross-node activation exchange. Experiments on a $1.3$B-parameter language model trained across $32$ nodes show that our method matches the perplexity of prior low-communication approaches under identical token and bandwidth budgets while reducing training FLOPs and peak memory.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes