Byzantine-Robust Distributed SGD: A Unified Analysis and Tight Error Bounds
For distributed machine learning practitioners, this paper offers a rigorous theoretical foundation for understanding the fundamental limits of Byzantine resilience in the presence of data heterogeneity and stochasticity.
This work provides a unified convergence analysis for Byzantine-robust distributed SGD, establishing tight error bounds for nonconvex and PL objectives under general data heterogeneity. It shows that local momentum reduces stochasticity-induced error and proves the bounds are tight via matching lower bounds.
Byzantine-robust distributed optimization relies on robust aggregation rules to mitigate the influence of malicious Byzantine workers. Despite the proliferation of such rules, a unified convergence analysis framework that accommodates general data heterogeneity is lacking. In this work, we provide a thorough convergence theory of Byzantine-robust distributed stochastic gradient descent (SGD), analyzing variants both with and without local momentum. We establish the convergence rates for nonconvex smooth objectives and those satisfying the Polyak-Lojasiewicz condition under a general data heterogeneity assumption. Our analysis reveals that while stochasticity and data heterogeneity introduce unavoidable error floors, local momentum provably reduces the error component induced by stochasticity. Furthermore, we derive matching lower bounds to demonstrate that the upper bounds obtained in our analysis are tight and characterize the fundamental limits of Byzantine resilience under stochasticity and data heterogeneity. Empirical results support our theoretical findings.