LGApr 12

Communication-Efficient Gluon in Federated Learning

Xun Qian, Alexander Gaponov, Grigory Malinovsky, Peter Richtárik

arXiv:2604.1068983.8h-index: 13

Predicted impact top 12% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For practitioners training large models in distributed settings, this work offers a practical optimizer that reduces communication overhead while maintaining convergence, though the improvements are incremental over existing compressed optimization methods.

This paper extends the Muon optimizer to a communication-efficient variant called Gluon for federated learning, achieving improved convergence rates and reduced communication cost under layer-wise smoothness with compression. Numerical experiments verify superior communication efficiency.

Recent developments have shown that Muon-type optimizers based on linear minimization oracles (LMOs) over non-Euclidean norm balls have the potential to get superior practical performance than Adam-type methods in the training of large language models. Since large-scale neural networks are trained across massive machines, communication cost becomes the bottleneck. To address this bottleneck, we investigate Gluon, which is an extension of Muon under the more general layer-wise $(L^0, L^1)$-smooth setting, with both unbiased and contraction compressors. In order to reduce the compression error, we employ the variance reduced technique in SARAH in our compressed methods. The convergence rates and improved communication cost are achieved under certain conditions. As a byproduct, a new variance reduced algorithm with faster convergence rate than Gluon is obtained. We also incorporate momentum variance reduction (MVR) to these compressed algorithms and comparable communication cost is derived under weaker conditions when $L_i^1 \neq 0$. Finally, several numerical experiments are conducted to verify the superior performance of our compressed algorithms in terms of communication cost.

View on arXiv PDF

Similar