LG MLNov 25, 2024

Understanding Generalization of Federated Learning: the Trade-off between Model Stability and Optimization

Dun Zeng, Zheshun Wu, Shiyu Liu, Yu Pan, Xiaoying Tang, Zenglin Xu

arXiv:2411.16303v34.62 citationsh-index: 8

Originality Incremental advance

AI Analysis

This work addresses the problem of poor generalization in FL for researchers and practitioners, offering incremental insights into hyperparameter tuning and algorithm design.

The paper tackles the challenge of understanding generalization in Federated Learning (FL) under data heterogeneity by introducing a new analysis framework called Libra, which reveals that increasing local steps or momentum improves convergence but worsens model stability, leading to better excess risk as validated in experiments.

Federated Learning (FL) is a distributed learning approach that trains machine learning models across multiple devices while keeping their local data private. However, FL often faces challenges due to data heterogeneity, leading to inconsistent local optima among clients. These inconsistencies can cause unfavorable convergence behavior and generalization performance degradation. Existing studies often describe this issue through \textit{convergence analysis} on gradient norms, focusing on how well a model fits training data, or through \textit{algorithmic stability}, which examines the generalization gap. However, neither approach precisely captures the generalization performance of FL algorithms, especially for non-convex neural network training. In response, this paper introduces an innovative generalization dynamics analysis framework, namely \textit{Libra}, for algorithm-dependent excess risk minimization, highlighting the trade-offs between model stability and gradient norms. We present Libra towards a standard federated optimization framework and its variants using server momentum. Through this framework, we show that larger local steps or momentum accelerate convergence of gradient norms, while worsening model stability, yielding better excess risk. Experimental results on standard FL settings prove the insights of our theories. These insights can guide hyperparameter tuning and future algorithm design to achieve stronger generalization.

View on arXiv PDF

Similar