LGMar 12

Byzantine-Robust Optimization under $(L_0, L_1)$-Smoothness

arXiv:2603.1251256.5
AI Analysis

This addresses robust distributed learning for systems vulnerable to adversarial attacks, representing an incremental improvement with specific algorithmic enhancements.

The paper tackles distributed optimization under Byzantine attacks with functions having state-dependent gradient Lipschitz constants, proposing Byz-NSGDM, which achieves a convergence rate of O(K^{-1/4}) and demonstrates effectiveness in experiments on tasks like MNIST classification and language modeling.

We consider distributed optimization under Byzantine attacks in the presence of $(L_0,L_1)$-smoothness, a generalization of standard $L$-smoothness that captures functions with state-dependent gradient Lipschitz constants. We propose Byz-NSGDM, a normalized stochastic gradient descent method with momentum that achieves robustness against Byzantine workers while maintaining convergence guarantees. Our algorithm combines momentum normalization with Byzantine-robust aggregation enhanced by Nearest Neighbor Mixing (NNM) to handle both the challenges posed by $(L_0,L_1)$-smoothness and Byzantine adversaries. We prove that Byz-NSGDM achieves a convergence rate of $O(K^{-1/4})$ up to a Byzantine bias floor proportional to the robustness coefficient and gradient heterogeneity. Experimental validation on heterogeneous MNIST classification, synthetic $(L_0,L_1)$-smooth optimization, and character-level language modeling with a small GPT model demonstrates the effectiveness of our approach against various Byzantine attack strategies. An ablation study further shows that Byz-NSGDM is robust across a wide range of momentum and learning rate choices.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes