Revisiting Gradient Staleness: Evaluating Distance Metrics for Asynchronous Federated Learning Aggregation
This work addresses the problem of degraded convergence and accuracy due to gradient staleness in asynchronous federated learning for practitioners deploying such systems.
This paper explores alternative distance metrics beyond Euclidean distance to measure gradient staleness in asynchronous federated learning. By integrating these metrics into the aggregation process, the authors demonstrate that certain metrics lead to more robust and efficient training, improving convergence speed, model performance, and training stability.
In asynchronous federated learning (FL), client devices send updates to a central server at varying times based on their computational speed, often using stale versions of the global model. This staleness can degrade the convergence and accuracy of the global model. Previous work, such as AsyncFedED, proposed an adaptive aggregation method using Euclidean distance to measure staleness. In this paper, we extend this approach by exploring alternative distance metrics to more accurately capture the effect of gradient staleness. We integrate these metrics into the aggregation process and evaluate their impact on convergence speed, model performance, and training stability under heterogeneous clients and non-IID data settings. Our results demonstrate that certain metrics lead to more robust and efficient asynchronous FL training, offering a stronger foundation for practical deployment.