Neighborhood and Global Perturbations Supported SAM in Federated Learning: From Local Tweaks To Global Awareness
This work addresses convergence and generalization issues in Federated Learning for privacy-preserving model training, representing an incremental improvement over existing methods.
The paper tackles the problem of data heterogeneity in Federated Learning, which causes local optima divergence and affects convergence, by proposing FedTOGA, a novel algorithm that links local perturbations to global updates to improve generalization and optimization consistency. The result is a 1% accuracy increase and 30% faster convergence compared to state-of-the-art methods.
Federated Learning (FL) can be coordinated under the orchestration of a central server to collaboratively build a privacy-preserving model without the need for data exchange. However, participant data heterogeneity leads to local optima divergence, subsequently affecting convergence outcomes. Recent research has focused on global sharpness-aware minimization (SAM) and dynamic regularization techniques to enhance consistency between global and local generalization and optimization objectives. Nonetheless, the estimation of global SAM introduces additional computational and memory overhead, while dynamic regularization suffers from bias in the local and global dual variables due to training isolation. In this paper, we propose a novel FL algorithm, FedTOGA, designed to consider optimization and generalization objectives while maintaining minimal uplink communication overhead. By linking local perturbations to global updates, global generalization consistency is improved. Additionally, global updates are used to correct local dynamic regularizers, reducing dual variables bias and enhancing optimization consistency. Global updates are passively received by clients, reducing overhead. We also propose neighborhood perturbation to approximate local perturbation, analyzing its strengths and limitations. Theoretical analysis shows FedTOGA achieves faster convergence $O(1/T)$ under non-convex functions. Empirical studies demonstrate that FedTOGA outperforms state-of-the-art algorithms, with a 1\% accuracy increase and 30\% faster convergence, achieving state-of-the-art.