LGNov 8, 2025

On the Convergence and Stability of Distributed Sub-model Training

arXiv:2511.06132v1h-index: 8
Originality Incremental advance
AI Analysis

This addresses the challenge of training large models efficiently in federated learning settings, though it appears incremental as it builds on existing sub-model training with shuffling techniques.

The paper tackles the problem of poor convergence performance in federated learning when using randomly sampled sub-models for on-device training, proposing a distributed shuffled sub-model training method that establishes a convergence rate and shows improved generalization through stability analysis.

As learning models continue to grow in size, enabling on-device local training of these models has emerged as a critical challenge in federated learning. A popular solution is sub-model training, where the server only distributes randomly sampled sub-models to the edge clients, and clients only update these small models. However, those random sampling of sub-models may not give satisfying convergence performance. In this paper, observing the success of SGD with shuffling, we propose a distributed shuffled sub-model training, where the full model is partitioned into several sub-models in advance, and the server shuffles those sub-models, sends each of them to clients at each round, and by the end of local updating period, clients send back the updated sub-models, and server averages them. We establish the convergence rate of this algorithm. We also study the generalization of distributed sub-model training via stability analysis, and find that the sub-model training can improve the generalization via amplifying the stability of training process. The extensive experiments also validate our theoretical findings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes